feat: support per-model overrides in llama.cpp load() (#5820)
* feat: support per-model overrides in llama.cpp load() Extend the `load()` method in the llama.cpp extension to accept optional `overrideSettings`, allowing fine-grained per-model configuration. This enables users to override provider-level settings such as `ctx_size`, `chat_template`, `n_gpu_layers`, etc., when loading a specific model. Fixes: #5818 (Feature Request - Jan v0.6.6) Use cases enabled: - Different context sizes per model (e.g., 4K vs 32K) - Model-specific chat templates (ChatML, Alpaca, etc.) - Performance tuning (threads, GPU layers) - Better memory management per deployment Maintains full backward compatibility with existing provider config. * swap overrideSettings and isEmbedding argument
This commit is contained in:
parent
bc4fe52f8d
commit
81d6ed3785
@ -764,6 +764,7 @@ export default class llamacpp_extension extends AIEngine {
|
|||||||
|
|
||||||
override async load(
|
override async load(
|
||||||
modelId: string,
|
modelId: string,
|
||||||
|
overrideSettings?: Partial<LlamacppConfig>,
|
||||||
isEmbedding: boolean = false
|
isEmbedding: boolean = false
|
||||||
): Promise<SessionInfo> {
|
): Promise<SessionInfo> {
|
||||||
const sInfo = this.findSessionByModel(modelId)
|
const sInfo = this.findSessionByModel(modelId)
|
||||||
@ -778,7 +779,7 @@ export default class llamacpp_extension extends AIEngine {
|
|||||||
)
|
)
|
||||||
}
|
}
|
||||||
const args: string[] = []
|
const args: string[] = []
|
||||||
const cfg = this.config
|
const cfg = { ...this.config, ...(overrideSettings ?? {}) }
|
||||||
const [version, backend] = cfg.version_backend.split('/')
|
const [version, backend] = cfg.version_backend.split('/')
|
||||||
if (!version || !backend) {
|
if (!version || !backend) {
|
||||||
throw new Error(
|
throw new Error(
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user