* feat: Smart model management
* **New UI option** – `memory_util` added to `settings.json` with a dropdown (high / medium / low) to let users control how aggressively the engine uses system memory.
* **Configuration updates** – `LlamacppConfig` now includes `memory_util`; the extension class stores it in a new `memoryMode` property and handles updates through `updateConfig`.
* **System memory handling**
* Introduced `SystemMemory` interface and `getTotalSystemMemory()` to report combined VRAM + RAM.
* Added helper methods `getKVCachePerToken`, `getLayerSize`, and a new `ModelPlan` type.
* **Smart model‑load planner** – `planModelLoad()` computes:
* Number of GPU layers that can fit in usable VRAM.
* Maximum context length based on KV‑cache size and the selected memory utilization mode (high/medium/low).
* Whether KV‑cache must be off‑loaded to CPU and the overall loading mode (GPU, Hybrid, CPU, Unsupported).
* Detailed logging of the planning decision.
* **Improved support check** – `isModelSupported()` now:
* Uses the combined VRAM/RAM totals from `getTotalSystemMemory()`.
* Applies an 80% usable‑memory heuristic.
* Returns **GREEN** only when both weights and KV‑cache fit in VRAM, **YELLOW** when they fit only in total memory or require CPU off‑load, and **RED** when the model cannot fit at all.
* **Cleanup** – Removed unused `GgufMetadata` import; updated imports and type definitions accordingly.
* **Documentation/comments** – Added explanatory JSDoc comments for the new methods and clarified the return semantics of `isModelSupported`.
* chore: migrate no_kv_offload from llamacpp setting to model setting
* chore: add UI auto optimize model setting
* feat: improve model loading planner with mmproj support and smarter memory budgeting
* Extend `ModelPlan` with optional `noOffloadMmproj` flag to indicate when a multimodal projector can stay in VRAM.
* Add `mmprojPath` parameter to `planModelLoad` and calculate its size, attempting to keep it on GPU when possible.
* Refactor system memory detection:
* Use `used_memory` (actual free RAM) instead of total RAM for budgeting.
* Introduced `usableRAM` placeholder for future use.
* Rewrite KV‑cache size calculation:
* Properly handle GQA models via `attention.head_count_kv`.
* Compute bytes per token as `nHeadKV * headDim * 2 * 2 * nLayer`.
* Replace the old 70 % VRAM heuristic with a more flexible budget:
* Reserve a fixed VRAM amount and apply an overhead factor.
* Derive usable system RAM from total memory minus VRAM.
* Implement a robust allocation algorithm:
* Prioritize placing the mmproj in VRAM.
* Search for the best balance of GPU layers and context length.
* Fallback strategies for hybrid and pure‑CPU modes with detailed safety checks.
* Add extensive validation of model size, KV‑cache size, layer size, and memory mode.
* Improve logging throughout the planning process for easier debugging.
* Adjust final plan return shape to include the new `noOffloadMmproj` field.
* remove unused variable
---------
Co-authored-by: Faisal Amir <urmauur@gmail.com>
* call jan api
* fix lint
* ci: add jan server web
* chore: add Dockerfile
* clean up ui ux and support for reasoning fields, make app spa
* add logo
* chore: update tag for preview image
* chore: update k8s service name
* chore: update image tag and image name
* fixed test
---------
Co-authored-by: Minh141120 <minh.itptit@gmail.com>
Co-authored-by: Nguyen Ngoc Minh <91668012+Minh141120@users.noreply.github.com>
* fix: check for env value before setting (#6266)
* fix: check for env value before setting
* Use empty instead of none
* fix: update linux build script to be consistent with CI (#6269)
The local build script for Linux was failing due to a bundling error. This commit updates the `build:tauri:linux` script in `package.json` to be consistent with the CI build pipeline, which resolves the issue.
The updated script now includes:
- **`NO_STRIP=1`**: This environment variable prevents the `linuxdeploy` utility from stripping debugging symbols, which was a potential cause of the bundling failure.
- **`--verbose`**: This flag provides more detailed output during the build, which can be useful for debugging similar issues in the future.
* fix: compatibility imported model
* fix: update copy mmproj setting desc
* fix: toggle vision for remote model
* chore: add tooltip visions
* chore: show model setting only for local provider
* fix/update-ui-info
* chore: update filter hub while searching
* fix: system monitor window permission
* chore: update credit description
---------
Co-authored-by: Akarshan Biswas <akarshan.biswas@gmail.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
Co-authored-by: Minh141120 <minh.itptit@gmail.com>
Co-authored-by: Nguyen Ngoc Minh <91668012+Minh141120@users.noreply.github.com>
* feat: Introduce structured error handling for llamacpp extension
This commit introduces a structured error handling system for the `llamacpp` extension. Instead of returning simple string errors, we now use a custom `LlamacppError` struct with a specific `ErrorCode` enum. This allows the frontend to display more user-friendly and actionable error messages based on the code, rather than raw debug logs.
The changes include:
- A new `ErrorCode` enum to categorize errors (e.g., `OutOfMemory`, `ModelArchNotSupported`, `BinaryNotFound`).
- A `LlamacppError` struct to encapsulate the code, a user-facing message, and optional detailed logs.
- A static method `from_stderr` that intelligently parses llama.cpp's standard error output to identify and map common issues like Out of Memory errors to a specific error code.
- Refactored `ServerError` enum to wrap the new `LlamacppError` and provide a consistent serialization format for the Tauri frontend.
- Updated all relevant functions (`load_llama_model`, `get_devices`) to return the new structured error type, ensuring a more robust and predictable error flow.
- A reduced timeout for model loading from 300 to 180 seconds.
This work lays the groundwork for a more intuitive and helpful user experience, as the application can now provide clear guidance to users when a model fails to load.
* Update src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* Update src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* chore: update FE handle error object from extension
* chore: fix property type
---------
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
* feat: Enhance Llama.cpp backend management with persistence
This commit introduces significant improvements to how the Llama.cpp extension manages and updates its backend installations, focusing on user preference persistence and smarter auto-updates.
Key changes include:
* **Persistent Backend Type Preference:** The extension now stores the user's preferred backend type (e.g., `cuda`, `cpu`, `metal`) in `localStorage`. This ensures that even after updates or restarts, the system attempts to use the user's previously selected backend type, if available.
* **Intelligent Auto-Update:** The auto-update mechanism has been refined to prioritize updating to the **latest version of the *currently selected backend type*** rather than always defaulting to the "best available" backend (which might change). This respects user choice while keeping the chosen backend type up-to-date.
* **Improved Initial Installation/Configuration:** For fresh installations or cases where the `version_backend` setting is invalid, the system now intelligently determines and installs the best available backend, then persists its type.
* **Refined Old Backend Cleanup:** The `removeOldBackends` function has been renamed to `removeOldBackend` and modified to specifically clean up *older versions of the currently selected backend type*, preventing the accumulation of unnecessary files while preserving other backend types the user might switch to.
* **Robust Local Storage Handling:** New private methods (`getStoredBackendType`, `setStoredBackendType`, `clearStoredBackendType`) are introduced to safely interact with `localStorage`, including error handling for potential `localStorage` access issues.
* **Version Filtering Utility:** A new utility `findLatestVersionForBackend` helps in identifying the latest available version for a specific backend type from a list of supported backends.
These changes provide a more stable, user-friendly, and maintainable backend management experience for the Llama.cpp extension.
Fixes: #5883
* fix: cortex models migration should be done once
* feat: Optimize Llama.cpp backend preference storage and UI updates
This commit refines the Llama.cpp extension's backend management by:
* **Optimizing `localStorage` Writes:** The system now only writes the backend type preference to `localStorage` if the new value is different from the currently stored one. This reduces unnecessary `localStorage` operations.
* **Ensuring UI Consistency on Initial Setup:** When a fresh installation or an invalid backend configuration is detected, the UI settings are now explicitly updated to reflect the newly determined `effectiveBackendString`, ensuring the displayed setting matches the active configuration.
These changes improve performance by reducing redundant storage operations and enhance user experience by maintaining UI synchronization with the backend state.
* Revert "fix: provider settings should be refreshed on page load (#5887)"
This reverts commit ce6af62c7df4a7e7ea8c0896f307309d6bf38771.
* fix: add loader version backend llamacpp
* fix: wrong key name
* fix: model setting issues
* fix: virtual dom hub
* chore: cleanup
* chore: hide device ofload setting
---------
Co-authored-by: Louis <louis@jan.ai>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
* Refactor translation imports and update text for localization across settings and system monitor routes
- Changed translation import from 'react-i18next' to '@/i18n/react-i18next-compat' in multiple files.
- Updated various text strings to use translation keys for better localization support in:
- Local API Server settings
- MCP Servers settings
- Privacy settings
- Provider settings
- Shortcuts settings
- System Monitor
- Thread details
- Ensured consistent use of translation keys for all user-facing text.
Update web-app/src/routes/settings/appearance.tsx
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Update web-app/src/routes/settings/appearance.tsx
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Update web-app/src/locales/vn/settings.json
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Update web-app/src/containers/dialogs/DeleteMCPServerConfirm.tsx
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Update web-app/src/locales/id/common.json
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* Add Chinese (Simplified and Traditional) localization files for various components
- Created `tools.json`, `updater.json`, `assistants.json`, `chat.json`, `common.json`, `hub.json`, `logs.json`, `mcp-servers.json`, `provider.json`, `providers.json`, `settings.json`, `setup.json`, `system-monitor.json`, `tool-approval.json` in both `zh-CN` and `zh-TW` locales.
- Added translations for tool approval, updater notifications, assistant management, chat interface, common UI elements, hub interactions, logging messages, MCP server configurations, provider management, settings options, setup instructions, and system monitoring.
* Refactor localization strings for improved clarity and consistency in English, Indonesian, and Vietnamese settings files
* Fix missing key and reword
* fix pr comment
---------
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>