133 Commits

Author SHA1 Message Date
Faisal Amir
be851ebcf1 chore: validate gguf file base metadata architecture 2025-09-08 20:16:20 +07:00
Faisal Amir
836990b7d9 chore: update fn check mmproj file 2025-09-08 11:10:00 +07:00
Dinh Long Nguyen
a30eb7f968
feat: Jan Web (reusing Jan Desktop UI) (#6298)
* add platform guards

* add service management

* fix types

* move to zustand for servicehub

* update App Updater

* update tauri missing move

* update app updater

* refactor: move PlatformFeatures to separate const file

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* change tauri fetch name

* update implementation

* update extension fetch

* make web version run properly

* disabled unused web settings

* fix all tests

* fix lint

* fix tests

* add mock for extension

* fix build

* update make and mise

* fix tsconfig for web-extensions

* fix loader type

* cleanup

* fix test

* update error handling + mcp should be working

* Update mcp init

* use separate is_web_app build property

* Remove fixed model catalog url

* fix additional tests

* fix download issue (event emitter not implemented correctly)

* Update Title html

* fix app logs

* update root tsx render timing

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-05 01:47:46 +07:00
hiento09
1b74772d07
feat: download llamacpp backend fail back to cdn (#6361)
* feat: download llamacpp backend fail back to cdn incase github api encounters errors
2025-09-04 09:39:16 +07:00
Akarshan Biswas
5fae954ac5
fix: Use 80% total memory for compatibility check (#6321)
* fix: Use 80% total memory for compatibility check

* refactor: extract usable memory percentage to named constant

Extract the hardcoded 0.8 multiplier into a named constant
USABLE_MEMORY_PERCENTAGE for better readability and maintainability.
2025-08-28 14:50:00 +05:30
Akarshan Biswas
64a608039b
fix: check for env value before setting (#6266)
* fix: check for env value before setting

* Use empty instead of none
2025-08-21 22:55:49 +05:30
Akarshan Biswas
510c70bdf7
feat: Add model compatibility check and memory estimation (#6243)
* feat: Add model compatibility check and memory estimation

This commit introduces a new feature to check if a given model is supported based on available device memory.

The change includes:
- A new `estimateKVCache` method that calculates the required memory for the model's KV cache. It uses GGUF metadata such as `block_count`, `head_count`, `key_length`, and `value_length` to perform the calculation.
- An `isModelSupported` method that combines the model file size and the estimated KV cache size to determine the total memory required. It then checks if any available device has sufficient free memory to load the model.
- An updated error message for the `version_backend` check to be more user-friendly, suggesting a stable internet connection as a potential solution for backend setup failures.

This functionality helps prevent the application from attempting to load models that would exceed the device's memory capacity, leading to more stable and predictable behavior.

fixes: #5505

* Update extensions/llamacpp-extension/src/index.ts

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* Update extensions/llamacpp-extension/src/index.ts

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* Extend this to available system RAM if GGML device is not available

* fix: Improve model metadata and memory checks

This commit refactors the logic for checking if a model is supported by a system's available memory.

**Key changes:**
- **Remote model support**: The `read_gguf_metadata` function can now fetch metadata from a remote URL by reading the file in chunks.
- **Improved KV cache size calculation**: The KV cache size is now estimated more accurately by using `attention.key_length` and `attention.value_length` from the GGUF metadata, with a fallback to `embedding_length`.
- **Granular memory check statuses**: The `isModelSupported` function now returns a more specific status (`'RED'`, `'YELLOW'`, `'GREEN'`) to indicate whether the model weights or the KV cache are too large for the available memory.
- **Consolidated logic**: The logic for checking local and remote models has been consolidated into a single `isModelSupported` function, improving code clarity and maintainability.

These changes provide more robust and informative model compatibility checks, especially for models hosted on remote servers.

* Update extensions/llamacpp-extension/src/index.ts

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* Make ctx_size optional and use sum free memory across ggml devices

* feat: hub and dropdown model selection handle model compatibility

* feat: update bage model info color

* chore: enable detail page to get compatibility model

* chore: update copy

* chore: update shrink indicator UI

---------

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
2025-08-21 16:13:50 +05:30
Akarshan Biswas
9c25480c7b
fix: Update placeholder text and error message (#6263)
This commit improves the clarity of the llama.cpp extension.

- Corrected a placeholder example from `GGML_VK_VISIBLE_DEVICES='0,1'` to `GGML_VK_VISIBLE_DEVICES=0,1` for better accuracy.
- Changed an ambiguous error message from `"Failed to load llama-server: ${error}"` to the more specific `"Failed to load llamacpp backend"`.
2025-08-21 16:01:31 +05:30
Akarshan Biswas
5c3a6fec32
feat: Add support for custom environmental variables to llama.cpp (#6256)
This commit adds a new setting `llamacpp_env` to the llama.cpp extension, allowing users to specify custom environment variables. These variables are passed to the backend process when it starts.

A new function `parseEnvFromString` is introduced to handle the parsing of the semicolon-separated key-value pairs from the user input. The environment variables are then used in the `load` function and when listing available devices. This enables more flexible configuration of the llama.cpp backend, such as specifying visible GPUs for Vulkan.

This change also updates the Tauri command `get_devices` to accept environment variables, ensuring that device discovery respects the user's settings.
2025-08-21 15:50:37 +05:30
Dinh Long Nguyen
32a2ca95b6
feat: gguf file size + hash validation (#5266) (#6259)
* feat: gguf file size + hash validation

* fix tests fe

* update cargo tests

* handle asyn download for both models and mmproj

* move progress tracker to models

* handle file download cancelled

* add cancellation mid hash run
2025-08-21 16:17:58 +07:00
Louis
3a36353b02
fix: backend variant selection 2025-08-21 10:54:35 +07:00
Akarshan Biswas
906b87022d
chore: re enable reasoning_content in backend (#6228)
* chore: re enable reasoning_content in backend

* chore: handle reasoning_content

* chore: refactor get reasoning content

* chore: update PR review

---------

Co-authored-by: Faisal Amir <urmauur@gmail.com>
2025-08-20 13:06:21 +05:30
Akarshan Biswas
0fc3dc6841
Fix: Validate GGUF files before loading (#6238)
This commit adds a GGUF validation check for both the main model file and the `mmproj` file (if present) before they are loaded. This prevents the extension from crashing if an invalid GGUF file is provided.

The `GgufMetadata` interface and `loadMetadata` function were removed as the `readGgufMetadata` is now invoked directly. The code has also been refactored to be more readable, with clearer variable names and more descriptive comments.
2025-08-20 10:31:19 +05:30
Faisal Amir
5481ee9e35
Merge pull request #6134 from menloresearch/feat/attachment-ui
feat: attachment UI
2025-08-20 10:04:32 +07:00
Akarshan Biswas
e761c439d7
feat: Pass API key via environment variable instead of command line argument (#6225)
This change modifies how the API key is passed to the llama-server process. Previously, it was sent as a command line argument (--api-key). This approach has been updated to pass the key via an environment variable (LLAMA_API_KEY).

This improves security by preventing the API key from being visible in the process list (ps aux on Linux, Task Manager on Windows, etc.), where it could potentially be exposed to other users or processes on the same system.

The commit also updates the Rust backend to read the API key from the environment variable instead of parsing it from the command line arguments.
2025-08-19 20:57:06 +05:30
Louis
df41bad465
fix: import 2025-08-19 22:16:24 +07:00
Louis
a210e2f13a
fix: timeout for completion request 2025-08-19 22:06:26 +07:00
Akarshan
cffbe1a77b Increase timeout for fetch in llamacpp 2025-08-19 20:00:10 +07:00
Faisal Amir
fdc8e07f86 chore: update model setting include offload-mmproj 2025-08-19 20:00:08 +07:00
Akarshan
9afeb5e514 feat: Add offload_mmproj option and validation
This commit introduces a new configuration option offload_mmproj to the llamacpp extension.

The offload_mmproj setting allows users to control whether the multimodal projector model is offloaded to the GPU. By default, it's offloaded for better performance. If set to false, the projector model will remain on the CPU, which can be useful in low GPU memory scenarios, though image processing might take longer.

Additionally, this commit adds validate_mmproj_path to ensure the provided --mmproj path is valid and accessible, preventing issues during model loading.

This change also refactors some invoke calls for improved readability.
2025-08-19 19:51:29 +07:00
Louis
55390de070
Merge pull request #6222 from menloresearch/feat/model-tool-use-detection
feat: #5917 - model tool use capability should be auto detected
2025-08-19 13:55:08 +07:00
Louis
bfe671d7b4
feat: #5917 - model tool use capability should be auto detected 2025-08-19 09:51:36 +07:00
Akarshan Biswas
5ad3d282af
fix: re-enable Vulkan backend in integrated GPUs with enough memory (#6215) 2025-08-18 17:31:01 +05:30
Dinh Long Nguyen
e1c8d98bf2
Backend Architecture Refactoring (#6094) (#6162)
* add llamacpp plugin

* Refactor llamacpp plugin

* add utils plugin

* remove utils folder

* add hardware implementation

* add utils folder + move utils function

* organize cargo files

* refactor utils src

* refactor util

* apply fmt

* fmt

* Update gguf + reformat

* add permission for gguf commands

* fix cargo test windows

* revert yarn lock

* remove cargo.lock for hardware plugin

* ignore cargo.lock file

* Fix hardware invoke + refactor hardware + refactor tests, constants

* use api wrapper in extension to invoke hardware call + api wrapper build integration

* add newline at EOF (per Akarshan)

* add vi mock for getSystemInfo
2025-08-15 08:59:01 +07:00
Faisal Amir
a66d83c598
Merge pull request #6172 from menloresearch/fix/model-id-special-char
fix: handle modelId special char
2025-08-14 12:33:58 +07:00
Akarshan Biswas
f4661912b0
feat: Add GGUF metadata reading functionality (#6120)
* feat: Add GGUF metadata reading functionality

This commit introduces a new Tauri command and a corresponding function to read metadata from GGUF model files.

The new read_gguf_metadata command in the Rust backend uses the byteorder crate to parse the GGUF file format and extract key metadata. This information, including the file's version, tensor count, and a key-value map of other metadata, is then made available to the TypeScript frontend.

This functionality is a foundational step toward providing users with more detailed information about their loaded models directly within the application.

This will be refactored later.

fixes: #6001

* loadMetadata() should return

* Properly throw eror to FE

* Use BufReader to improve performance
2025-08-13 22:54:20 +05:30
Akarshan Biswas
02ded9b545
fix: Improve error message for invalid version/backend format (#6149)
* fix: Improve error message for invalid version/backend format

This commit changes the error message displayed when the `version_backend` configuration is invalid. The new message is more user-friendly and suggests a simple solution, such as restarting the application, which is more helpful to the user than the previous technical error message.

* fix typo
2025-08-12 21:38:22 +05:30
Akarshan Biswas
0cfc745954
feat: Introduce structured error handling for llamacpp extension (#6087)
* feat: Introduce structured error handling for llamacpp extension

This commit introduces a structured error handling system for the `llamacpp` extension. Instead of returning simple string errors, we now use a custom `LlamacppError` struct with a specific `ErrorCode` enum. This allows the frontend to display more user-friendly and actionable error messages based on the code, rather than raw debug logs.

The changes include:
- A new `ErrorCode` enum to categorize errors (e.g., `OutOfMemory`, `ModelArchNotSupported`, `BinaryNotFound`).
- A `LlamacppError` struct to encapsulate the code, a user-facing message, and optional detailed logs.
- A static method `from_stderr` that intelligently parses llama.cpp's standard error output to identify and map common issues like Out of Memory errors to a specific error code.
- Refactored `ServerError` enum to wrap the new `LlamacppError` and provide a consistent serialization format for the Tauri frontend.
- Updated all relevant functions (`load_llama_model`, `get_devices`) to return the new structured error type, ensuring a more robust and predictable error flow.
- A reduced timeout for model loading from 300 to 180 seconds.

This work lays the groundwork for a more intuitive and helpful user experience, as the application can now provide clear guidance to users when a model fails to load.

* Update src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* Update src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* chore: update FE handle error object from extension

* chore: fix property type

---------

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
2025-08-07 23:28:25 +05:30
Akarshan Biswas
6a699d8004
refactor: move session management & port allocation to backend (#6083)
* refactor: move session management & port allocation to backend

- Remove the in‑process `activeSessions` map and its cleanup logic from the TypeScript side.
- Introduce new Tauri commands in Rust:
  - `get_random_port` – picks an unused port using a seeded RNG and checks availability.
  - `find_session_by_model` – returns the `SessionInfo` for a given model ID.
  - `get_loaded_models` – returns a list of currently loaded model IDs.
- Update the extension’s TypeScript code to use these commands via `invoke`:
  - `findSessionByModel`, `load`, `unload`, `chat`, `getLoadedModels`, and `embed` now operate asynchronously and query the backend.
  - Remove the old `is_port_available` command and the custom port‑checking loop.
  - Simplify `onUnload` – session termination is now handled by the backend.
- Drop unused helpers (`sleep`, `waitForModelLoad`) and related port‑availability code.
- Add missing Rust imports (`rand::{StdRng,Rng,SeedableRng}`, `HashSet`) and improve error handling.
- Register the new commands in `src-tauri/src/lib.rs` (replace `is_port_available` with the three new commands).

This refactor centralises session state and port allocation in the Rust backend, eliminates duplicated logic, and resolves race conditions around model loading and session cleanup.

* Use String(e) for error

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

---------

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
2025-08-07 13:06:21 +05:30
Akarshan Biswas
1f1605bdf9
feat: Add support for overriding tensor buffer type (#6062)
* feat: Add support for overriding tensor buffer type

This commit introduces a new configuration option, `override_tensor_buffer_t`, which allows users to specify a regex for matching tensor names to override their buffer type. This is an advanced setting primarily useful for optimizing the performance of large models, particularly Mixture of Experts (MoE) models.

By overriding the tensor buffer type, users can keep critical parts of the model, like the attention layers, on the GPU while offloading other parts, such as the expert feed-forward networks, to the CPU. This can lead to significant speed improvements for massive models.

Additionally, this change refines the error message to be more specific when a model fails to load. The previous message "Failed to load llama-server" has been updated to "Failed to load model" to be more accurate.

* chore: update FE to suppoer override-tensor

---------

Co-authored-by: Faisal Amir <urmauur@gmail.com>
2025-08-07 10:31:34 +05:30
Akarshan Biswas
8d147c1774
fix: Add conditional Vulkan support check for better GPU compatibility (#6066)
Changes:
- Introduce conditional Vulkan support check for discrete GPUs with 6GB+ VRAM

fixes: #6009
2025-08-06 07:20:44 +05:30
Faisal Amir
5d001dfd5a
feat: jinja template customize per model instead provider level (#6053) 2025-08-05 21:21:41 +07:00
Faisal Amir
99567a1102
feat: recommended label llamacpp setting (#6052)
* feat: recommended label llamacpp

* chore: remove log
2025-08-05 13:55:33 +07:00
Louis
813c911487
Merge pull request #6046 from menloresearch/fix/support-missing-llamacpp-cuda-backends
fix: support missing llamacpp cuda backends
2025-08-05 12:37:31 +07:00
Louis
4a4bc35cce fix: should check for invalid backend to cover previous missing backend case 2025-08-05 11:41:02 +07:00
Akarshan Biswas
5e533bdedc
feat: Improve llama.cpp argument handling and add device parsing tests (#6041)
* feat: Improve llama.cpp argument handling and add device parsing tests

This commit refactors how arguments are passed to llama.cpp,
specifically by only adding arguments when their values differ from
their defaults. This reduces the verbosity of the command and prevents
potential conflicts or errors when llama.cpp's default behavior aligns
with the desired setting.

Additionally, new tests have been added for parsing device output from
llama.cpp, ensuring the accurate extraction of GPU information (ID,
name, total memory, and free memory). This improves the robustness of
device detection.

The following changes were made:

* **Remove redundant `--ctx-size` argument:** The `--ctx-size`
    argument is now only explicitly added if `cfg.ctx_size` is greater
    than 0.
* **Conditional argument adding for default values:**
    * `--split-mode` is only added if `cfg.split_mode` is not empty
        and not 'layer'.
    * `--main-gpu` is only added if `cfg.main_gpu` is not undefined
        and not 0.
    * `--cache-type-k` is only added if `cfg.cache_type_k` is not 'f16'.
    * `--cache-type-v` is only added if `cfg.cache_type_v` is not 'f16'
        (when `flash_attn` is enabled) or not 'f32' (otherwise). This
        also corrects the `flash_attn` condition.
    * `--defrag-thold` is only added if `cfg.defrag_thold` is not 0.1.
    * `--rope-scaling` is only added if `cfg.rope_scaling` is not
        'none'.
    * `--rope-scale` is only added if `cfg.rope_scale` is not 1.
    * `--rope-freq-base` is only added if `cfg.rope_freq_base` is not 0.
    * `--rope-freq-scale` is only added if `cfg.rope_freq_scale` is
        not 1.
* **Add `parse_device_output` tests:** Comprehensive unit tests were
    added to `src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs`
    to validate the parsing of llama.cpp device output under various
    scenarios, including multiple devices, single devices, different
    backends (CUDA, Vulkan, SYCL), complex GPU names, and error
    conditions.

* fixup cache_type_v comparision
2025-08-04 19:47:04 +05:30
Louis
45c2b02842 test: add tests for new changes 2025-08-04 16:01:04 +07:00
Louis
bf9315dbbe fix: add missing cuda backend support 2025-08-04 15:54:21 +07:00
Faisal Amir
787c4ee073
fix: wrong desc setting cont_batching (#6034) 2025-08-02 21:48:43 +07:00
Louis
7a3d9d765c
fix: failed provider models list due to broken cortex import (#5983) 2025-07-30 17:37:44 +07:00
Akarshan Biswas
f61ce886a0
feat: Enhance port selection with availability check (#5966)
This change improves the robustness of the llama.cpp extension's server port selection.

Previously, the `getRandomPort()` method only checked for ports already in use by active sessions, which could lead to model load failures if the chosen port was occupied by another external process.

This change introduces a new Tauri command, `is_port_available`, which performs a system-level check to ensure the randomly selected port is truly free before attempting to start the llama-server. It also adds a retry mechanism with a maximum number of attempts (20,000) to find an available port, throwing an error if no suitable port is found within the specified range after all attempts.

This enhancement prevents port conflicts and improves the reliability and user experience of the llama.cpp extension within Jan.

Closes #5965
2025-07-29 18:01:52 +05:30
Akarshan Biswas
07421d7f53
fix: set autoUnload in onLoad() (#5956)
The variable was not initialized resulted in always setting true when
starting.

This change fixes it.
2025-07-28 20:54:21 +05:30
Akarshan Biswas
fa896b3bf3
fix: correctly apply auto_unload setting from config (#5953)
Previously, the `autoUnload` flag was not being updated when set via config,
causing models to be auto-unloaded regardless of the intended behavior.
This patch ensures the setting is respected at runtime.
2025-07-28 19:17:29 +05:30
Akarshan Biswas
432c942330
fix: Prevent race condition with auto-unload during rapid model loading (#5947)
This commit addresses a race condition where, with "Auto-Unload Old Models" enabled, rapidly attempting to load multiple models could result in more than one model being loaded simultaneously.

Previously, the unloading logic did not account for models that were still in the process of loading when a new load operation was initiated. This allowed new models to start loading before the previous ones had fully completed their unload cycle.

To resolve this:
- A `loadingModels` map has been introduced to track promises for models currently in the loading state.
- The `load` method now checks if a model is already being loaded and, if so, returns the existing promise, preventing duplicate load operations for the same model.
- The `performLoad` method (which encapsulates the actual loading logic) now ensures that when `autoUnload` is active, it waits for any *other* models that are concurrently loading to finish before proceeding to unload all currently loaded models. This guarantees that the auto-unload mechanism properly unloads all models, including those initiated in quick succession, thereby preventing the race condition.

This fixes the issue where clicking the start button very fast on multiple models would bypass the auto-unload functionality.
2025-07-28 12:59:48 +05:30
Louis
1fc37a9349
fix: migrate app settings to the new version (#5936)
* fix: migrate app settings to the new version

* fix: edge cases

* fix: migrate HF import model on Windows

* fix hardware page broken after downgraded

* test: correct test

* fix: backward compatible hardware info
2025-07-27 21:13:05 +07:00
Akarshan Biswas
c9b44eec52
fix: Remove sInfo from activeSessions before unloading (#5938)
This commit addresses a potential race condition that could lead to "connection errors" when unloading a llamacpp model.

The issue arose because the `activeSessions` map still has the session info of the model during unload. This could lead to "connection errors" when the backend is taking time to unload while there is an ongoing request to the model.

The fix involves:

1. **Deleting the `pid` from `activeSessions` before calling backend's unload:** This ensures that the model is cleared from the map before we start unloading.
2. **Failure handling**: If somehow the backend fails to unload, the session info for that model is added back to prevent any race conditions.

This commit improves the robustness and reliability of the unloading process by preventing potential conflicts.
2025-07-27 14:37:34 +05:30
Akarshan Biswas
8ec4a36826
fix: Frontend updates when llama.cpp backend auto-downloads (#5926) 2025-07-26 08:48:29 +07:00
Akarshan Biswas
3982ed4c6f
fix: Allow N-GPU Layers (NGL) to be set to 0 in llama.cpp (#5907)
* fix: Allow N-GPU Layers (NGL) to be set to 0 in llama.cpp

The `n_gpu_layers` (NGL) setting in the llama.cpp extension was incorrectly preventing users from disabling GPU layers by automatically defaulting to 100 when set to 0.

This was caused by a condition that only pushed `cfg.n_gpu_layers` if it was greater than 0 (`cfg.n_gpu_layers > 0`).

This commit updates the condition to `cfg.n_gpu_layers >= 0`, allowing 0 to be a valid and accepted value for NGL. This ensures that users can effectively disable GPU offloading when desired.

* fix: default ngl

---------

Co-authored-by: Louis <louis@jan.ai>
2025-07-25 16:24:53 +05:30
Akarshan Biswas
4d4cf896af
fix: Persist 'Auto-Unload Old Models' setting in llama.cpp (#5906)
The 'Auto-Unload Old Models' setting in the llama.cpp extension failed to persist due to a typo in its key name within `settings.json`. The key was incorrectly `auto_unload_models` instead of `auto_unload`.

This commit corrects the key name to `auto_unload`, ensuring that user-configured changes to this setting are properly saved, retrieved, and persist across application restarts.

This resolves the issue where the setting would change and remain to its previous value after being changed.
2025-07-25 11:03:15 +05:30
Akarshan Biswas
a1af70f7a9
feat: Enhance Llama.cpp backend management with persistence (#5886)
* feat: Enhance Llama.cpp backend management with persistence

This commit introduces significant improvements to how the Llama.cpp extension manages and updates its backend installations, focusing on user preference persistence and smarter auto-updates.

Key changes include:

* **Persistent Backend Type Preference:** The extension now stores the user's preferred backend type (e.g., `cuda`, `cpu`, `metal`) in `localStorage`. This ensures that even after updates or restarts, the system attempts to use the user's previously selected backend type, if available.
* **Intelligent Auto-Update:** The auto-update mechanism has been refined to prioritize updating to the **latest version of the *currently selected backend type*** rather than always defaulting to the "best available" backend (which might change). This respects user choice while keeping the chosen backend type up-to-date.
* **Improved Initial Installation/Configuration:** For fresh installations or cases where the `version_backend` setting is invalid, the system now intelligently determines and installs the best available backend, then persists its type.
* **Refined Old Backend Cleanup:** The `removeOldBackends` function has been renamed to `removeOldBackend` and modified to specifically clean up *older versions of the currently selected backend type*, preventing the accumulation of unnecessary files while preserving other backend types the user might switch to.
* **Robust Local Storage Handling:** New private methods (`getStoredBackendType`, `setStoredBackendType`, `clearStoredBackendType`) are introduced to safely interact with `localStorage`, including error handling for potential `localStorage` access issues.
* **Version Filtering Utility:** A new utility `findLatestVersionForBackend` helps in identifying the latest available version for a specific backend type from a list of supported backends.

These changes provide a more stable, user-friendly, and maintainable backend management experience for the Llama.cpp extension.

Fixes: #5883

* fix: cortex models migration should be done once

* feat: Optimize Llama.cpp backend preference storage and UI updates

This commit refines the Llama.cpp extension's backend management by:

* **Optimizing `localStorage` Writes:** The system now only writes the backend type preference to `localStorage` if the new value is different from the currently stored one. This reduces unnecessary `localStorage` operations.
* **Ensuring UI Consistency on Initial Setup:** When a fresh installation or an invalid backend configuration is detected, the UI settings are now explicitly updated to reflect the newly determined `effectiveBackendString`, ensuring the displayed setting matches the active configuration.

These changes improve performance by reducing redundant storage operations and enhance user experience by maintaining UI synchronization with the backend state.

* Revert "fix: provider settings should be refreshed on page load (#5887)"

This reverts commit ce6af62c7df4a7e7ea8c0896f307309d6bf38771.

* fix: add loader version backend llamacpp

* fix: wrong key name

* fix: model setting issues

* fix: virtual dom hub

* chore: cleanup

* chore: hide device ofload setting

---------

Co-authored-by: Louis <louis@jan.ai>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
2025-07-24 18:33:35 +07:00