- Add `src-tauri/resources/` to `.gitignore`.
- Introduced utilities to read locally installed backends (`getLocalInstalledBackends`) and fetch remote supported backends (`fetchRemoteSupportedBackends`).
- Refactored `listSupportedBackends` to merge remote and local entries with deduplication and proper sorting.
- Exported `getBackendDir` and integrated it into the extension.
- Added helper `parseBackendVersion` and new method `checkBackendForUpdates` to detect newer backend versions.
- Implemented `installBackend` for manual backend archive installation, including platform‑specific binary path handling.
- Updated command‑line argument logic for `--flash-attn` to respect version‑specific defaults.
- Modified Tauri filesystem `decompress` command to remove overly strict path validation.
* feat: Add model compatibility check and memory estimation
This commit introduces a new feature to check if a given model is supported based on available device memory.
The change includes:
- A new `estimateKVCache` method that calculates the required memory for the model's KV cache. It uses GGUF metadata such as `block_count`, `head_count`, `key_length`, and `value_length` to perform the calculation.
- An `isModelSupported` method that combines the model file size and the estimated KV cache size to determine the total memory required. It then checks if any available device has sufficient free memory to load the model.
- An updated error message for the `version_backend` check to be more user-friendly, suggesting a stable internet connection as a potential solution for backend setup failures.
This functionality helps prevent the application from attempting to load models that would exceed the device's memory capacity, leading to more stable and predictable behavior.
fixes: #5505
* Update extensions/llamacpp-extension/src/index.ts
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* Update extensions/llamacpp-extension/src/index.ts
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* Extend this to available system RAM if GGML device is not available
* fix: Improve model metadata and memory checks
This commit refactors the logic for checking if a model is supported by a system's available memory.
**Key changes:**
- **Remote model support**: The `read_gguf_metadata` function can now fetch metadata from a remote URL by reading the file in chunks.
- **Improved KV cache size calculation**: The KV cache size is now estimated more accurately by using `attention.key_length` and `attention.value_length` from the GGUF metadata, with a fallback to `embedding_length`.
- **Granular memory check statuses**: The `isModelSupported` function now returns a more specific status (`'RED'`, `'YELLOW'`, `'GREEN'`) to indicate whether the model weights or the KV cache are too large for the available memory.
- **Consolidated logic**: The logic for checking local and remote models has been consolidated into a single `isModelSupported` function, improving code clarity and maintainability.
These changes provide more robust and informative model compatibility checks, especially for models hosted on remote servers.
* Update extensions/llamacpp-extension/src/index.ts
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* Make ctx_size optional and use sum free memory across ggml devices
* feat: hub and dropdown model selection handle model compatibility
* feat: update bage model info color
* chore: enable detail page to get compatibility model
* chore: update copy
* chore: update shrink indicator UI
---------
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
This commit adds a new setting `llamacpp_env` to the llama.cpp extension, allowing users to specify custom environment variables. These variables are passed to the backend process when it starts.
A new function `parseEnvFromString` is introduced to handle the parsing of the semicolon-separated key-value pairs from the user input. The environment variables are then used in the `load` function and when listing available devices. This enables more flexible configuration of the llama.cpp backend, such as specifying visible GPUs for Vulkan.
This change also updates the Tauri command `get_devices` to accept environment variables, ensuring that device discovery respects the user's settings.
This change modifies how the API key is passed to the llama-server process. Previously, it was sent as a command line argument (--api-key). This approach has been updated to pass the key via an environment variable (LLAMA_API_KEY).
This improves security by preventing the API key from being visible in the process list (ps aux on Linux, Task Manager on Windows, etc.), where it could potentially be exposed to other users or processes on the same system.
The commit also updates the Rust backend to read the API key from the environment variable instead of parsing it from the command line arguments.
This commit introduces a new configuration option offload_mmproj to the llamacpp extension.
The offload_mmproj setting allows users to control whether the multimodal projector model is offloaded to the GPU. By default, it's offloaded for better performance. If set to false, the projector model will remain on the CPU, which can be useful in low GPU memory scenarios, though image processing might take longer.
Additionally, this commit adds validate_mmproj_path to ensure the provided --mmproj path is valid and accessible, preventing issues during model loading.
This change also refactors some invoke calls for improved readability.
* feat: Add GGUF metadata reading functionality
This commit introduces a new Tauri command and a corresponding function to read metadata from GGUF model files.
The new read_gguf_metadata command in the Rust backend uses the byteorder crate to parse the GGUF file format and extract key metadata. This information, including the file's version, tensor count, and a key-value map of other metadata, is then made available to the TypeScript frontend.
This functionality is a foundational step toward providing users with more detailed information about their loaded models directly within the application.
This will be refactored later.
fixes: #6001
* loadMetadata() should return
* Properly throw eror to FE
* Use BufReader to improve performance
* feat: Introduce structured error handling for llamacpp extension
This commit introduces a structured error handling system for the `llamacpp` extension. Instead of returning simple string errors, we now use a custom `LlamacppError` struct with a specific `ErrorCode` enum. This allows the frontend to display more user-friendly and actionable error messages based on the code, rather than raw debug logs.
The changes include:
- A new `ErrorCode` enum to categorize errors (e.g., `OutOfMemory`, `ModelArchNotSupported`, `BinaryNotFound`).
- A `LlamacppError` struct to encapsulate the code, a user-facing message, and optional detailed logs.
- A static method `from_stderr` that intelligently parses llama.cpp's standard error output to identify and map common issues like Out of Memory errors to a specific error code.
- Refactored `ServerError` enum to wrap the new `LlamacppError` and provide a consistent serialization format for the Tauri frontend.
- Updated all relevant functions (`load_llama_model`, `get_devices`) to return the new structured error type, ensuring a more robust and predictable error flow.
- A reduced timeout for model loading from 300 to 180 seconds.
This work lays the groundwork for a more intuitive and helpful user experience, as the application can now provide clear guidance to users when a model fails to load.
* Update src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* Update src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* chore: update FE handle error object from extension
* chore: fix property type
---------
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>