jan/llamacpp-extension at 489c5a3d9c3e588db18fb4c3d311b68bbf283fe5 - jan

History

Akarshan Biswas 489c5a3d9c

fix: KVCache size calculation and refactor (#6438 )

- Removed the unused `getKVCachePerToken` helper and replaced it with a unified `estimateKVCache` that returns both total size and per‑token size.
- Fixed the KV cache size calculation to account for all layers, correcting previous under‑estimation.
- Added proper clamping of user‑requested context lengths to the model’s maximum.
- Refactored VRAM budgeting: introduced explicit reserves, fixed engine overhead, and separate multipliers for VRAM and system RAM based on memory mode.
- Implemented a more robust planning flow with clear GPU, Hybrid, and CPU pathways, including fallback configurations when resources are insufficient.
- Updated default context length handling and safety buffers to prevent OOM situations.
- Adjusted usable memory percentage to 90 % and refined logging for easier debugging.

2025-09-15 10:16:13 +05:30

src

fix: KVCache size calculation and refactor (#6438 )

2025-09-15 10:16:13 +05:30

package.json

fix: timeout for completion request

2025-08-19 22:06:26 +07:00

rolldown.config.mjs

fix: import

2025-08-19 22:16:24 +07:00

settings.json

feat: Smart model management (#6390 )