jan/extensions
Akarshan Biswas 81d6ed3785
feat: support per-model overrides in llama.cpp load() (#5820)
* feat: support per-model overrides in llama.cpp load()

Extend the `load()` method in the llama.cpp extension to accept optional
`overrideSettings`, allowing fine-grained per-model configuration.

This enables users to override provider-level settings such as `ctx_size`,
`chat_template`, `n_gpu_layers`, etc., when loading a specific model.

Fixes: #5818 (Feature Request - Jan v0.6.6)

Use cases enabled:
- Different context sizes per model (e.g., 4K vs 32K)
- Model-specific chat templates (ChatML, Alpaca, etc.)
- Performance tuning (threads, GPU layers)
- Better memory management per deployment

Maintains full backward compatibility with existing provider config.

* swap overrideSettings and isEmbedding argument
2025-07-21 08:59:50 +05:30
..
2025-07-07 11:24:13 +07:00