jan/extensions at e02be47aae4f5ed7b0e9b3efee4ea69dabef1c65 - jan

History

Akarshan Biswas 489c5a3d9c

fix: KVCache size calculation and refactor (#6438 )

- Removed the unused `getKVCachePerToken` helper and replaced it with a unified `estimateKVCache` that returns both total size and per‑token size.
- Fixed the KV cache size calculation to account for all layers, correcting previous under‑estimation.
- Added proper clamping of user‑requested context lengths to the model’s maximum.
- Refactored VRAM budgeting: introduced explicit reserves, fixed engine overhead, and separate multipliers for VRAM and system RAM based on memory mode.
- Implemented a more robust planning flow with clear GPU, Hybrid, and CPU pathways, including fallback configurations when resources are insufficient.
- Updated default context length handling and safety buffers to prevent OOM situations.
- Adjusted usable memory percentage to 90 % and refined logging for easier debugging.

2025-09-15 10:16:13 +05:30

assistant-extension

feat: support inserting current date into assistant prompt

2025-08-17 00:24:00 -07:00

conversational-extension

refactor: clean up cortex (#6003 )

2025-07-31 21:58:12 +07:00

download-extension

feat: gguf file size + hash validation (#5266 ) (#6259 )

2025-08-21 16:17:58 +07:00

llamacpp-extension

fix: KVCache size calculation and refactor (#6438 )