feat: support per-model overrides in llama.cpp load() (#5820)

* feat: support per-model overrides in llama.cpp load()

Extend the `load()` method in the llama.cpp extension to accept optional
`overrideSettings`, allowing fine-grained per-model configuration.

This enables users to override provider-level settings such as `ctx_size`,
`chat_template`, `n_gpu_layers`, etc., when loading a specific model.

Fixes: #5818 (Feature Request - Jan v0.6.6)

Use cases enabled:
- Different context sizes per model (e.g., 4K vs 32K)
- Model-specific chat templates (ChatML, Alpaca, etc.)
- Performance tuning (threads, GPU layers)
- Better memory management per deployment

Maintains full backward compatibility with existing provider config.

* swap overrideSettings and isEmbedding argument
This commit is contained in:
Akarshan Biswas 2025-07-21 08:59:50 +05:30 committed by GitHub
parent bc4fe52f8d
commit 81d6ed3785
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -764,6 +764,7 @@ export default class llamacpp_extension extends AIEngine {
override async load(
modelId: string,
overrideSettings?: Partial<LlamacppConfig>,
isEmbedding: boolean = false
): Promise<SessionInfo> {
const sInfo = this.findSessionByModel(modelId)
@ -778,7 +779,7 @@ export default class llamacpp_extension extends AIEngine {
)
}
const args: string[] = []
const cfg = this.config
const cfg = { ...this.config, ...(overrideSettings ?? {}) }
const [version, backend] = cfg.version_backend.split('/')
if (!version || !backend) {
throw new Error(