94 Commits

Author SHA1 Message Date
Louis
7a3d9d765c
fix: failed provider models list due to broken cortex import (#5983) 2025-07-30 17:37:44 +07:00
Akarshan Biswas
f61ce886a0
feat: Enhance port selection with availability check (#5966)
This change improves the robustness of the llama.cpp extension's server port selection.

Previously, the `getRandomPort()` method only checked for ports already in use by active sessions, which could lead to model load failures if the chosen port was occupied by another external process.

This change introduces a new Tauri command, `is_port_available`, which performs a system-level check to ensure the randomly selected port is truly free before attempting to start the llama-server. It also adds a retry mechanism with a maximum number of attempts (20,000) to find an available port, throwing an error if no suitable port is found within the specified range after all attempts.

This enhancement prevents port conflicts and improves the reliability and user experience of the llama.cpp extension within Jan.

Closes #5965
2025-07-29 18:01:52 +05:30
Akarshan Biswas
07421d7f53
fix: set autoUnload in onLoad() (#5956)
The variable was not initialized resulted in always setting true when
starting.

This change fixes it.
2025-07-28 20:54:21 +05:30
Akarshan Biswas
fa896b3bf3
fix: correctly apply auto_unload setting from config (#5953)
Previously, the `autoUnload` flag was not being updated when set via config,
causing models to be auto-unloaded regardless of the intended behavior.
This patch ensures the setting is respected at runtime.
2025-07-28 19:17:29 +05:30
Akarshan Biswas
432c942330
fix: Prevent race condition with auto-unload during rapid model loading (#5947)
This commit addresses a race condition where, with "Auto-Unload Old Models" enabled, rapidly attempting to load multiple models could result in more than one model being loaded simultaneously.

Previously, the unloading logic did not account for models that were still in the process of loading when a new load operation was initiated. This allowed new models to start loading before the previous ones had fully completed their unload cycle.

To resolve this:
- A `loadingModels` map has been introduced to track promises for models currently in the loading state.
- The `load` method now checks if a model is already being loaded and, if so, returns the existing promise, preventing duplicate load operations for the same model.
- The `performLoad` method (which encapsulates the actual loading logic) now ensures that when `autoUnload` is active, it waits for any *other* models that are concurrently loading to finish before proceeding to unload all currently loaded models. This guarantees that the auto-unload mechanism properly unloads all models, including those initiated in quick succession, thereby preventing the race condition.

This fixes the issue where clicking the start button very fast on multiple models would bypass the auto-unload functionality.
2025-07-28 12:59:48 +05:30
Louis
1fc37a9349
fix: migrate app settings to the new version (#5936)
* fix: migrate app settings to the new version

* fix: edge cases

* fix: migrate HF import model on Windows

* fix hardware page broken after downgraded

* test: correct test

* fix: backward compatible hardware info
2025-07-27 21:13:05 +07:00
Akarshan Biswas
c9b44eec52
fix: Remove sInfo from activeSessions before unloading (#5938)
This commit addresses a potential race condition that could lead to "connection errors" when unloading a llamacpp model.

The issue arose because the `activeSessions` map still has the session info of the model during unload. This could lead to "connection errors" when the backend is taking time to unload while there is an ongoing request to the model.

The fix involves:

1. **Deleting the `pid` from `activeSessions` before calling backend's unload:** This ensures that the model is cleared from the map before we start unloading.
2. **Failure handling**: If somehow the backend fails to unload, the session info for that model is added back to prevent any race conditions.

This commit improves the robustness and reliability of the unloading process by preventing potential conflicts.
2025-07-27 14:37:34 +05:30
Akarshan Biswas
8ec4a36826
fix: Frontend updates when llama.cpp backend auto-downloads (#5926) 2025-07-26 08:48:29 +07:00
Akarshan Biswas
3982ed4c6f
fix: Allow N-GPU Layers (NGL) to be set to 0 in llama.cpp (#5907)
* fix: Allow N-GPU Layers (NGL) to be set to 0 in llama.cpp

The `n_gpu_layers` (NGL) setting in the llama.cpp extension was incorrectly preventing users from disabling GPU layers by automatically defaulting to 100 when set to 0.

This was caused by a condition that only pushed `cfg.n_gpu_layers` if it was greater than 0 (`cfg.n_gpu_layers > 0`).

This commit updates the condition to `cfg.n_gpu_layers >= 0`, allowing 0 to be a valid and accepted value for NGL. This ensures that users can effectively disable GPU offloading when desired.

* fix: default ngl

---------

Co-authored-by: Louis <louis@jan.ai>
2025-07-25 16:24:53 +05:30
Akarshan Biswas
4d4cf896af
fix: Persist 'Auto-Unload Old Models' setting in llama.cpp (#5906)
The 'Auto-Unload Old Models' setting in the llama.cpp extension failed to persist due to a typo in its key name within `settings.json`. The key was incorrectly `auto_unload_models` instead of `auto_unload`.

This commit corrects the key name to `auto_unload`, ensuring that user-configured changes to this setting are properly saved, retrieved, and persist across application restarts.

This resolves the issue where the setting would change and remain to its previous value after being changed.
2025-07-25 11:03:15 +05:30
Akarshan Biswas
a1af70f7a9
feat: Enhance Llama.cpp backend management with persistence (#5886)
* feat: Enhance Llama.cpp backend management with persistence

This commit introduces significant improvements to how the Llama.cpp extension manages and updates its backend installations, focusing on user preference persistence and smarter auto-updates.

Key changes include:

* **Persistent Backend Type Preference:** The extension now stores the user's preferred backend type (e.g., `cuda`, `cpu`, `metal`) in `localStorage`. This ensures that even after updates or restarts, the system attempts to use the user's previously selected backend type, if available.
* **Intelligent Auto-Update:** The auto-update mechanism has been refined to prioritize updating to the **latest version of the *currently selected backend type*** rather than always defaulting to the "best available" backend (which might change). This respects user choice while keeping the chosen backend type up-to-date.
* **Improved Initial Installation/Configuration:** For fresh installations or cases where the `version_backend` setting is invalid, the system now intelligently determines and installs the best available backend, then persists its type.
* **Refined Old Backend Cleanup:** The `removeOldBackends` function has been renamed to `removeOldBackend` and modified to specifically clean up *older versions of the currently selected backend type*, preventing the accumulation of unnecessary files while preserving other backend types the user might switch to.
* **Robust Local Storage Handling:** New private methods (`getStoredBackendType`, `setStoredBackendType`, `clearStoredBackendType`) are introduced to safely interact with `localStorage`, including error handling for potential `localStorage` access issues.
* **Version Filtering Utility:** A new utility `findLatestVersionForBackend` helps in identifying the latest available version for a specific backend type from a list of supported backends.

These changes provide a more stable, user-friendly, and maintainable backend management experience for the Llama.cpp extension.

Fixes: #5883

* fix: cortex models migration should be done once

* feat: Optimize Llama.cpp backend preference storage and UI updates

This commit refines the Llama.cpp extension's backend management by:

* **Optimizing `localStorage` Writes:** The system now only writes the backend type preference to `localStorage` if the new value is different from the currently stored one. This reduces unnecessary `localStorage` operations.
* **Ensuring UI Consistency on Initial Setup:** When a fresh installation or an invalid backend configuration is detected, the UI settings are now explicitly updated to reflect the newly determined `effectiveBackendString`, ensuring the displayed setting matches the active configuration.

These changes improve performance by reducing redundant storage operations and enhance user experience by maintaining UI synchronization with the backend state.

* Revert "fix: provider settings should be refreshed on page load (#5887)"

This reverts commit ce6af62c7df4a7e7ea8c0896f307309d6bf38771.

* fix: add loader version backend llamacpp

* fix: wrong key name

* fix: model setting issues

* fix: virtual dom hub

* chore: cleanup

* chore: hide device ofload setting

---------

Co-authored-by: Louis <louis@jan.ai>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
2025-07-24 18:33:35 +07:00
Faisal Amir
399671488c
fix: gpu detected from backend version (#5882)
* fix: gpu detected from backend version

* chore: remove readonly props from dynamic field
2025-07-24 10:45:48 +07:00
Akarshan Biswas
1d0bb53f2a
feat: add support for querying available backend devices (#5877)
* feat: add support for querying available backend devices

This change introduces a new `get_devices` method to the `llamacpp_extension` engine that allows the frontend to query and display a list of available devices (e.g., Vulkan, CUDA, SYCL) from the compiled `llama-server` binary.

* Added `DeviceList` interface to represent GPU/device metadata.
* Implemented `getDevices(): Promise<DeviceList[]>` method.

  * Splits `version/backend`, ensures backend is ready.
  * Invokes the new Tauri command `get_devices`.

* Introduced a new `get_devices` Tauri command.
* Parses `llama-server --list-devices` output to extract available devices with memory info.
* Introduced `DeviceInfo` struct (`id`, `name`, `mem`, `free`) and exposed it via serialization.
* Robust parsing logic using string processing (non-regex) to locate memory stats.
* Registered the new command in the `tauri::Builder` in `lib.rs`.

* Fixed logic to correctly parse multiple devices from the llama-server output.
* Handles common failure modes: binary not found, malformed memory info, etc.

This sets the foundation for device selection, memory-aware model loading, and improved diagnostics in Jan AI engine setup flows.

* Update extensions/llamacpp-extension/src/index.ts

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

---------

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
2025-07-23 19:20:12 +05:30
Faisal Amir
43b7eb6e18
🐛fix: remove sampling parameters from llamacpp extension (#5871) 2025-07-23 12:13:42 +07:00
Louis
fe95031c6e
feat: migrate cortex models to llamacpp extension (#5838)
* feat: migrate cortex models to new llama.cpp extension

* test: add tests

* clean: remove duplicated import
2025-07-22 23:35:08 +07:00
Akarshan Biswas
1eaec5e4f6
Fix: engine unable to find dlls on when running on Windows (#5863)
* Fix: Windows llamacpp not picking up dlls from lib repo

* Fix lib path on Windows

* Add debug info about lib_path

* Normalize lib_path for Windows

* fix window lib path normalization

* fix: missing cuda dll files on windows

* throw backend setup errors to UI

* Fix format

* Update extensions/llamacpp-extension/src/index.ts

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* feat: add logger to llamacpp-extension

* fix: platform check

---------

Co-authored-by: Louis <louis@jan.ai>
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
2025-07-22 20:05:24 +05:30
Akarshan Biswas
f59739d2b0
refactor: Improve Llama.cpp backend management and auto-update (#5845)
* refactor: Improve Llama.cpp backend management and auto-update

This commit refactors the Llama.cpp extension to enhance backend management and streamline the auto-update process.

Key changes include:

Refactored configureBackends: The logic for determining the best available backend and populating settings is now more modular, preventing duplicate executions.

Dedicated Auto-update Handling: Introduced a handleAutoUpdate method to encapsulate the auto-update logic, including downloading the latest available backend and updating the internal configuration and settings.

Robust Old Backend Cleanup: The removeOldBackends method is improved to ensure only the currently used backend version and type are kept, effectively managing disk space. A delay is added for Windows to prevent file conflicts during cleanup.

Final Installation Check: A ensureFinalBackendInstallation method is added to guarantee the selected backend is installed, acting as a final safeguard after auto-update or if auto-update is disabled.

Minor Fixes:

Added console.log for save_path during decompression for better debugging.

Ensured the output directory exists before decompression in the Rust backend.

Removed extraneous console log for session info.

Updated Cargo.toml and tauri.conf.json versions.

These changes lead to a more reliable and efficient Llama.cpp backend experience within the application, particularly for users with auto-update enabled.

* fix isBackendInstalled parameters

* Address bot's comments

* Address bot comments of using try finally block
2025-07-22 14:35:34 +05:30
Akarshan Biswas
81d6ed3785
feat: support per-model overrides in llama.cpp load() (#5820)
* feat: support per-model overrides in llama.cpp load()

Extend the `load()` method in the llama.cpp extension to accept optional
`overrideSettings`, allowing fine-grained per-model configuration.

This enables users to override provider-level settings such as `ctx_size`,
`chat_template`, `n_gpu_layers`, etc., when loading a specific model.

Fixes: #5818 (Feature Request - Jan v0.6.6)

Use cases enabled:
- Different context sizes per model (e.g., 4K vs 32K)
- Model-specific chat templates (ChatML, Alpaca, etc.)
- Performance tuning (threads, GPU layers)
- Better memory management per deployment

Maintains full backward compatibility with existing provider config.

* swap overrideSettings and isEmbedding argument
2025-07-21 08:59:50 +05:30
Louis
bc4fe52f8d
fix: llama.cpp integration model load and chat experience (#5823)
* fix: stop generating should not stop running models

* fix: ensure backend ready before loading model

* fix: backend setting should not block onLoad
2025-07-21 09:29:26 +07:00
Louis
19cb1c96e0
fix: llama.cpp backend download on windows (#5813)
* fix: llama.cpp backend download on windows

* test: add missing cases

* clean: linter

* fix: build
2025-07-20 16:58:09 +07:00
Akarshan Biswas
8f1a36c8e3
fix: Improve stream error handling and parsing (#5807)
* fix: Enhance stream error handling and parsing

This commit improves the robustness of stream processing in the llamacpp-extension.

- Adds explicit handling for 'error:' prefixed lines in the stream, parsing the contained JSON error and throwing an appropriate JavaScript Error.
- Centralizes JSON parsing of 'data:' and 'error:' lines, ensuring consistent error propagation by re-throwing parsing exceptions.
- Ensures the async iterator terminates correctly upon encountering stream errors or malformed JSON.

* Address bot comments and cleanup
2025-07-18 18:36:33 +05:30
Akarshan Biswas
bcb60378c0
fix: Add --reasoning-format none to support rendering of reasoning content (#5803) 2025-07-18 08:22:37 +05:30
Louis
8ca507c01c
feat: proxy support for the new downloader (#5795)
* feat: proxy support for the new downloader

* test: remove outdated test

* ci: clean up
2025-07-17 23:10:21 +07:00
Akarshan Biswas
92703bceb2
refactor: move thinking toggle to runtime settings for dynamic control (#5800)
* refactor: move thinking toggle to runtime settings for per-message control

Replaces the static `reasoning_budget` config with a dynamic `enable_thinking` flag under `chat_template_kwargs`, allowing models like Jan-nano and Qwen3 to enable/disable thinking behavior at runtime, even mid-conversation.
Requires UI update

* remove engine argument
2025-07-17 20:18:24 +05:30
Akarshan Biswas
b736d09168
fix: Prevent spamming /health endpoint and improve startup and resolve compiler warnings (#5784)
* fix: Prevent spamming /health endpoint and improve startup and resolve compiler warnings

This commit introduces a delay and improved logic to the /health endpoint checks in the llamacpp extension, preventing excessive requests during model loading.

Additionally, it addresses several Rust compiler warnings by:
- Commenting out an unused `handle_app_quit` function in `src/core/mcp.rs`.
- Explicitly declaring `target_port`, `session_api_key`, and `buffered_body` as mutable in `src/core/server.rs`.
- Commenting out unused `tokio` imports in `src/core/setup.rs`.
- Enhancing the `load_llama_model` function in `src/core/utils/extensions/inference_llamacpp_extension/server.rs` to better monitor stdout/stderr for readiness and errors, and handle timeouts.
- Commenting out an unused `std::path::Prefix` import and adjusting `normalize_path` in `src/core/utils/mod.rs`.
- Updating the application version to 0.6.904 in `tauri.conf.json`.

* fix grammar!

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* fix grammar 2

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* reimport prefix but only on Windows

* remove instead of commenting

* remove redundant check

* sync app version in cargo.toml with tauri.conf

---------

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
2025-07-16 18:18:11 +05:30
Akarshan Biswas
dee98f41d1
Feat: Improved llamacpp Server Stability and Diagnostics (#5761)
* feat: Improve llamacpp server error reporting and model load stability

This commit introduces significant improvements to how the llamacpp server
process is managed and how its errors are reported.

Key changes:
- **Enhanced Error Reporting:** The llamacpp server's stdout and stderr
  are now piped and captured. If the llamacpp process exits prematurely
  or fails to start, its stderr output is captured and returned as a
  `LlamacppError`. This provides much more specific and actionable
  diagnostic information for users and developers.
- **Increased Model Load Timeout:** The `waitForModelLoad` timeout has
  been increased from 30 seconds to 240 seconds (4 minutes). This
  addresses issues where larger models or slower systems would
  prematurely time out during the model loading phase.
- **API Secret Update:** The internal API secret for the llamacpp
  extension has been updated from 'Jan' to 'JustAskNow'.
- **Version Bump:** The application version in `tauri.conf.json` has
  been incremented to `0.6.901`.

* fix: should not spam load requests

* test: add test to cover the fix

* refactor: clean up

* test: add more test case

---------

Co-authored-by: Louis <louis@jan.ai>
2025-07-14 11:55:44 +05:30
Akarshan Biswas
96ba42e411
feat: Add missing ctx-shift toggle (#5765)
* feat: Add missing ctx_shift

* fix typo

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* refine description

---------

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
2025-07-14 11:51:34 +05:30
Louis
b2ce138ea0
test: add tests 2025-07-11 09:21:11 +07:00
Akarshan
d4a3d6a0d6
Refactor session PID types from string to number across backend and extension
- Changed `pid` field in `SessionInfo` from `string` to `number`/`i32` in TypeScript and Rust.
- Updated `activeSessions` map key from `string` to `number` to align with new PID type.
- Adjusted process monitoring logic to correctly handle numeric PIDs.
- Removed fallback UUID-based PID generation in favor of numeric fallback (-1).
- Added PID cleanup logic in `is_process_running` when the process is no longer alive.
- Bumped application version from 0.5.16 to 0.6.900 in `tauri.conf.json`.
2025-07-04 21:40:54 +05:30
Akarshan
dbdc031583
chore: store session_info in backend as well for API server(WIP) 2025-07-04 20:31:30 +05:30
Akarshan
ffef7b9cab enhancement: Add custom Jinja chat template option
Adds a new configuration option `chat_template` to the Llama.cpp extension, allowing users to define a custom Jinja chat template for the model.

The template can be provided via a new input field in the settings, and if set, it will be passed to the Llama.cpp backend using the `--chat-template` argument. This enhances flexibility for users who require specific chat formatting beyond the GGUF default.

The `chat_template` is added to the `LlamacppConfig` type and conditionally pushed to the command arguments if it's provided. The placeholder text provides an example of a Jinja template structure.
2025-07-03 23:38:16 +07:00
Akarshan
40f1fd4ffd
feat: Auto update backend implementation 2025-07-03 19:32:12 +05:30
Akarshan
c2493fc535
Fix camelCase 2025-07-03 09:13:33 +05:30
Akarshan
396573055f
Address bot's review comment and minor refactoring 2025-07-03 09:13:33 +05:30
Akarshan
37151ba926
Feat: Auto load and download default backend during first launch 2025-07-03 09:13:32 +05:30
Akarshan
449bf17692
Add process aliveness check 2025-07-02 12:29:03 +07:00
Louis
c6ac9f1d2a
feat: sync hub with model catalog 2025-07-02 12:29:01 +07:00
Louis
b538d57207
feat: auto unload models on model start 2025-07-02 12:28:25 +07:00
Akarshan
0cbf35dc77
Add auto unload setting to llamacpp-extension 2025-07-02 12:28:25 +07:00
Akarshan
54691044d4
Add missing --jinja flag 2025-07-02 12:28:25 +07:00
Louis
8bd4a3389f
refactor: frontend uses new engine extension
# Conflicts:
#	extensions/model-extension/resources/default.json
#	web-app/src/containers/dialogs/DeleteProvider.tsx
#	web-app/src/routes/hub.tsx
2025-07-02 12:28:24 +07:00
Akarshan
48d1164858
feat: add embedding support to llamacpp extension
This commit introduces embedding functionality to the llamacpp extension. It allows users to generate embeddings for text inputs using the 'sentence-transformer-mini' model.  The changes include:

- Adding a new `embed` method to the `llamacpp_extension` class.
- Implementing model loading and API interaction for embeddings.
- Handling potential errors during API requests.
- Adding necessary types for embedding responses and data.
- The load method now accepts a boolean parameter to determine if it should load embedding model.
2025-07-02 12:27:36 +07:00
Akarshan
f463008362
feat: add model load wait to ensure model is ready before use 2025-07-02 12:27:35 +07:00
Akarshan
9d4e7cb2b8
fix: correct model_id to model_id in console error message
This change ensures that the error message includes the correct model ID, as `modelId` is capitalized in the `sInfo` object.
2025-07-02 12:27:35 +07:00
Akarshan
dbcce86bb8
refactor: rename interfaces and add getLoadedModels
The changes include:
- Renaming interfaces (sessionInfo -> SessionInfo, unloadResult -> UnloadResult) for consistency
- Adding getLoadedModels() method to retrieve active model IDs
- Updating variable names from modelId to model_id for alignment
- Updating cleanup paths to use XDG-standard locations
- Improving type consistency across extension implementation
2025-07-02 12:27:35 +07:00
Akarshan
4ffc504150
style: Rename camelCase to snake_case in llamacpp extension code
Rename variable, struct, and enum names from camelCase to snake_case throughout the llamacpp extension codebase to align with Rust naming conventions. This change improves readability and consistency without altering functionality.
2025-07-02 12:27:34 +07:00
Akarshan
6c769c5db9
feat: refactor llama server process storage to use HashMap
Change the llama_server_process state from an Option<Child> to a HashMap<String, Child> to support managing multiple server instances by PID. This allows precise process tracking and termination, replacing the previous single-process limitation.

Previously, only one server process could be tracked at a time. Now, each process is stored with its PID as the key, enabling:
- Accurate session matching during unloading
- Proper termination of specific processes
- Better error handling for mismatched PIDs

The load_llama_model function now inserts processes into the map, and unload_llama_model removes them by PID.
2025-07-02 12:27:34 +07:00
Thien Tran
525cc93d4a
fix system cudart detection on linux 2025-07-02 12:27:34 +07:00
Thien Tran
95944fa081
add Jan's library path to path 2025-07-02 12:27:17 +07:00
Thien Tran
65d6f34878
check for system libraries 2025-07-02 12:27:17 +07:00