211 Commits

Author SHA1 Message Date
Sherzod Mutalov
5f06a35f4e fix: use attributes to check the feature existence 2025-08-06 12:24:21 +10:00
Sherzod Mutalov
280ea1aa9f chore: extracted macos avx2 check code to the utility function 2025-08-06 12:23:18 +10:00
Sherzod Mutalov
ad9c4854a9 chore: added comments 2025-08-06 12:20:30 +10:00
Sherzod Mutalov
49c8334e40 chore: replaced with macros call to remove warning 2025-08-06 12:20:30 +10:00
Sherzod Mutalov
f1dd42de9e fix: use system npx on old mac's 2025-08-06 12:20:30 +10:00
Akarshan Biswas
5e533bdedc
feat: Improve llama.cpp argument handling and add device parsing tests (#6041)
* feat: Improve llama.cpp argument handling and add device parsing tests

This commit refactors how arguments are passed to llama.cpp,
specifically by only adding arguments when their values differ from
their defaults. This reduces the verbosity of the command and prevents
potential conflicts or errors when llama.cpp's default behavior aligns
with the desired setting.

Additionally, new tests have been added for parsing device output from
llama.cpp, ensuring the accurate extraction of GPU information (ID,
name, total memory, and free memory). This improves the robustness of
device detection.

The following changes were made:

* **Remove redundant `--ctx-size` argument:** The `--ctx-size`
    argument is now only explicitly added if `cfg.ctx_size` is greater
    than 0.
* **Conditional argument adding for default values:**
    * `--split-mode` is only added if `cfg.split_mode` is not empty
        and not 'layer'.
    * `--main-gpu` is only added if `cfg.main_gpu` is not undefined
        and not 0.
    * `--cache-type-k` is only added if `cfg.cache_type_k` is not 'f16'.
    * `--cache-type-v` is only added if `cfg.cache_type_v` is not 'f16'
        (when `flash_attn` is enabled) or not 'f32' (otherwise). This
        also corrects the `flash_attn` condition.
    * `--defrag-thold` is only added if `cfg.defrag_thold` is not 0.1.
    * `--rope-scaling` is only added if `cfg.rope_scaling` is not
        'none'.
    * `--rope-scale` is only added if `cfg.rope_scale` is not 1.
    * `--rope-freq-base` is only added if `cfg.rope_freq_base` is not 0.
    * `--rope-freq-scale` is only added if `cfg.rope_freq_scale` is
        not 1.
* **Add `parse_device_output` tests:** Comprehensive unit tests were
    added to `src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs`
    to validate the parsing of llama.cpp device output under various
    scenarios, including multiple devices, single devices, different
    backends (CUDA, Vulkan, SYCL), complex GPU names, and error
    conditions.

* fixup cache_type_v comparision
2025-08-04 19:47:04 +05:30
Akarshan Biswas
b1984a452e
Fix: Llama.cpp server hangs on model load (#6030)
* Fix: Llama.cpp server hangs on model load

Resolves an issue where the llama.cpp server would hang indefinitely when loading certain models, as described in the attached ticket. The server's readiness message was not being correctly detected, causing the application to stall.

The previous implementation used a line-buffered reader (BufReader::lines()) to process the stderr stream. This method proved to be unreliable for the specific output of the llama.cpp server.

This commit refactors the stderr handling logic to use a more robust, chunk-based approach (read_until(b'\n', ...)). This ensures that the output is processed as it arrives, reliably capturing critical status messages and preventing the application from hanging during model initialization.

Fixes: #6021

* Handle error gracefully with ServerError

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* Revert "Handle error gracefully with ServerError"

This reverts commit 267a8a8a3262fbe36a445a30b8b3ba9a39697643.

* Revert "Fix: Llama.cpp server hangs on model load"

This reverts commit 44e5447f82f0ae32b6db7ffb213025f130d655c4.

* Add more guards, refactor and fix error sending to FE

---------

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
2025-08-02 21:50:07 +05:30
Louis
9c0d09c487
refactor: clean up cortex (#6003)
* refactor: clean up cortex

* chore: clean up

* refactor: clean up
2025-07-31 21:58:12 +07:00
Akarshan
e76d207718
Fixup: tauri::WindowEvent 2025-07-31 15:41:43 +07:00
Akarshan
b3e8201481
Add RunEvent::Exit event to tauri to handle macos context menu exit 2025-07-31 15:41:43 +07:00
Akarshan Biswas
0aaaca05a4
fix: use direct process termination instead of console events on Windows (#5972)
* fix: remove CREATE_NEW_PROCESS_GROUP flag for proper Ctrl-C handling

CREATE_NEW_PROCESS_GROUP prevented GenerateConsoleCtrlEvent from working,
causing graceful shutdown failures. Removed to enable proper signal handling.

* Revert "fix: remove CREATE_NEW_PROCESS_GROUP flag for proper Ctrl-C handling"

This reverts commit 82ace3e72e4bf7338f422d5c79bdd6a0f8a2440e.

* fix: use direct process termination instead of console events

Simplified Windows process cleanup by removing console attachment logic
and using direct child.kill() method. More reliable for headless processes.

* Fix missing imports

* switch to tokio::time

* Don't wait while forcefully terminate process using kill API on Windows

Disabled use of windows-sys crate as graceful shutdown on Windows is unreliable in this context.

Updated cleanup.rs and server.rs to directly call child.kill().await for terminating processes on Windows.

Improved logging for process termination and error handling during kill and wait.

Removed timeout-based graceful shutdown attempt on Windows since TerminateProcess is inherently forceful and immediate.

This ensures more predictable process cleanup behavior on Windows platforms.

* final cleanups
2025-07-30 10:09:20 +05:30
Akarshan Biswas
f61ce886a0
feat: Enhance port selection with availability check (#5966)
This change improves the robustness of the llama.cpp extension's server port selection.

Previously, the `getRandomPort()` method only checked for ports already in use by active sessions, which could lead to model load failures if the chosen port was occupied by another external process.

This change introduces a new Tauri command, `is_port_available`, which performs a system-level check to ensure the randomly selected port is truly free before attempting to start the llama-server. It also adds a retry mechanism with a maximum number of attempts (20,000) to find an available port, throwing an error if no suitable port is found within the specified range after all attempts.

This enhancement prevents port conflicts and improves the reliability and user experience of the llama.cpp extension within Jan.

Closes #5965
2025-07-29 18:01:52 +05:30
Louis
812a8082b8
fix: factory reset fail with access denied error (#5952)
* fix: factory reset fail due to access denied error

* fix: unused import

* fix: tests
2025-07-28 23:20:45 +07:00
Akarshan Biswas
1d0bb53f2a
feat: add support for querying available backend devices (#5877)
* feat: add support for querying available backend devices

This change introduces a new `get_devices` method to the `llamacpp_extension` engine that allows the frontend to query and display a list of available devices (e.g., Vulkan, CUDA, SYCL) from the compiled `llama-server` binary.

* Added `DeviceList` interface to represent GPU/device metadata.
* Implemented `getDevices(): Promise<DeviceList[]>` method.

  * Splits `version/backend`, ensures backend is ready.
  * Invokes the new Tauri command `get_devices`.

* Introduced a new `get_devices` Tauri command.
* Parses `llama-server --list-devices` output to extract available devices with memory info.
* Introduced `DeviceInfo` struct (`id`, `name`, `mem`, `free`) and exposed it via serialization.
* Robust parsing logic using string processing (non-regex) to locate memory stats.
* Registered the new command in the `tauri::Builder` in `lib.rs`.

* Fixed logic to correctly parse multiple devices from the llama-server output.
* Handles common failure modes: binary not found, malformed memory info, etc.

This sets the foundation for device selection, memory-aware model loading, and improved diagnostics in Jan AI engine setup flows.

* Update extensions/llamacpp-extension/src/index.ts

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

---------

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
2025-07-23 19:20:12 +05:30
Louis
3afdd0fa1d
fix: tmp download file should be removed on cancel (#5849) 2025-07-23 12:52:34 +07:00
Akarshan Biswas
1eaec5e4f6
Fix: engine unable to find dlls on when running on Windows (#5863)
* Fix: Windows llamacpp not picking up dlls from lib repo

* Fix lib path on Windows

* Add debug info about lib_path

* Normalize lib_path for Windows

* fix window lib path normalization

* fix: missing cuda dll files on windows

* throw backend setup errors to UI

* Fix format

* Update extensions/llamacpp-extension/src/index.ts

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* feat: add logger to llamacpp-extension

* fix: platform check

---------

Co-authored-by: Louis <louis@jan.ai>
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
2025-07-22 20:05:24 +05:30
Akarshan Biswas
f59739d2b0
refactor: Improve Llama.cpp backend management and auto-update (#5845)
* refactor: Improve Llama.cpp backend management and auto-update

This commit refactors the Llama.cpp extension to enhance backend management and streamline the auto-update process.

Key changes include:

Refactored configureBackends: The logic for determining the best available backend and populating settings is now more modular, preventing duplicate executions.

Dedicated Auto-update Handling: Introduced a handleAutoUpdate method to encapsulate the auto-update logic, including downloading the latest available backend and updating the internal configuration and settings.

Robust Old Backend Cleanup: The removeOldBackends method is improved to ensure only the currently used backend version and type are kept, effectively managing disk space. A delay is added for Windows to prevent file conflicts during cleanup.

Final Installation Check: A ensureFinalBackendInstallation method is added to guarantee the selected backend is installed, acting as a final safeguard after auto-update or if auto-update is disabled.

Minor Fixes:

Added console.log for save_path during decompression for better debugging.

Ensured the output directory exists before decompression in the Rust backend.

Removed extraneous console log for session info.

Updated Cargo.toml and tauri.conf.json versions.

These changes lead to a more reliable and efficient Llama.cpp backend experience within the application, particularly for users with auto-update enabled.

* fix isBackendInstalled parameters

* Address bot's comments

* Address bot comments of using try finally block
2025-07-22 14:35:34 +05:30
Akarshan Biswas
08de0fa42d
fix: prevent terminal window from opening on model load on WindowsOS (#5837)
On Windows, spawning the llamacpp server was causing an unwanted terminal window
to appear. This is now fixed by combining `CREATE_NO_WINDOW` with
`CREATE_NEW_PROCESS_GROUP` using `.creation_flags(...)`, ensuring that the
process runs in the background without a console window.

This change only applies to 64-bit Windows builds.
2025-07-21 13:24:31 +05:30
Louis
19cb1c96e0
fix: llama.cpp backend download on windows (#5813)
* fix: llama.cpp backend download on windows

* test: add missing cases

* clean: linter

* fix: build
2025-07-20 16:58:09 +07:00
Louis
c550f6cf0d
Merge pull request #5809 from menloresearch/refactor/simplify-proxy-settings
refactor: simplify proxy settings by removing unused SSL verification options
2025-07-19 16:34:37 +07:00
Louis
8d84c3b884
feat: add model load error handling to improve UX (#5802)
* feat: model load error handling

* chore: clean up

* test: add tests

* fix: provider name
2025-07-18 08:25:54 +05:30
Louis
8ca507c01c
feat: proxy support for the new downloader (#5795)
* feat: proxy support for the new downloader

* test: remove outdated test

* ci: clean up
2025-07-17 23:10:21 +07:00
Akarshan Biswas
b736d09168
fix: Prevent spamming /health endpoint and improve startup and resolve compiler warnings (#5784)
* fix: Prevent spamming /health endpoint and improve startup and resolve compiler warnings

This commit introduces a delay and improved logic to the /health endpoint checks in the llamacpp extension, preventing excessive requests during model loading.

Additionally, it addresses several Rust compiler warnings by:
- Commenting out an unused `handle_app_quit` function in `src/core/mcp.rs`.
- Explicitly declaring `target_port`, `session_api_key`, and `buffered_body` as mutable in `src/core/server.rs`.
- Commenting out unused `tokio` imports in `src/core/setup.rs`.
- Enhancing the `load_llama_model` function in `src/core/utils/extensions/inference_llamacpp_extension/server.rs` to better monitor stdout/stderr for readiness and errors, and handle timeouts.
- Commenting out an unused `std::path::Prefix` import and adjusting `normalize_path` in `src/core/utils/mod.rs`.
- Updating the application version to 0.6.904 in `tauri.conf.json`.

* fix grammar!

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* fix grammar 2

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* reimport prefix but only on Windows

* remove instead of commenting

* remove redundant check

* sync app version in cargo.toml with tauri.conf

---------

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
2025-07-16 18:18:11 +05:30
Sam Hoang Van
9a76c94e22
update rmcp to fix issues (#5290) 2025-07-14 16:49:27 +07:00
Akarshan Biswas
dee98f41d1
Feat: Improved llamacpp Server Stability and Diagnostics (#5761)
* feat: Improve llamacpp server error reporting and model load stability

This commit introduces significant improvements to how the llamacpp server
process is managed and how its errors are reported.

Key changes:
- **Enhanced Error Reporting:** The llamacpp server's stdout and stderr
  are now piped and captured. If the llamacpp process exits prematurely
  or fails to start, its stderr output is captured and returned as a
  `LlamacppError`. This provides much more specific and actionable
  diagnostic information for users and developers.
- **Increased Model Load Timeout:** The `waitForModelLoad` timeout has
  been increased from 30 seconds to 240 seconds (4 minutes). This
  addresses issues where larger models or slower systems would
  prematurely time out during the model loading phase.
- **API Secret Update:** The internal API secret for the llamacpp
  extension has been updated from 'Jan' to 'JustAskNow'.
- **Version Bump:** The application version in `tauri.conf.json` has
  been incremented to `0.6.901`.

* fix: should not spam load requests

* test: add test to cover the fix

* refactor: clean up

* test: add more test case

---------

Co-authored-by: Louis <louis@jan.ai>
2025-07-14 11:55:44 +05:30
Louis
a8ed759a06 fix: model download - windows path issue 2025-07-10 09:42:36 +07:00
Louis
2f02a228cc
fix: download on windows 2025-07-08 15:41:17 +07:00
Akarshan
d5ffc6a476
feat: Migrate Jan's API server to llamacpp-extension
Things to ponder:
- Now, the v1/models endpoint of the API server will return an empty
  list if no models are loaded
- Streaming v1/chat/completion routing works as well as v1/models; needs
  further testing
2025-07-07 20:52:00 +05:30
Akarshan
d4a3d6a0d6
Refactor session PID types from string to number across backend and extension
- Changed `pid` field in `SessionInfo` from `string` to `number`/`i32` in TypeScript and Rust.
- Updated `activeSessions` map key from `string` to `number` to align with new PID type.
- Adjusted process monitoring logic to correctly handle numeric PIDs.
- Removed fallback UUID-based PID generation in favor of numeric fallback (-1).
- Added PID cleanup logic in `is_process_running` when the process is no longer alive.
- Bumped application version from 0.5.16 to 0.6.900 in `tauri.conf.json`.
2025-07-04 21:40:54 +05:30
Akarshan
dbdc031583
chore: store session_info in backend as well for API server(WIP) 2025-07-04 20:31:30 +05:30
Akarshan
03f0c5aad6
fix: remove unsupported BOOL for windows_sys in cleanup to fix windows build(attempt 3) 2025-07-03 18:35:13 +05:30
Akarshan
11db1ecaed
fix: server-side Ctrl-C handling for Windows x86_64 targets (attempt 2)
The current implementation of Ctrl-C handling was not properly tested on Windows x86_64 architectures. To address this, the code has been modified to use `i32` instead of `BOOL` to handle the result of the `GenerateConsoleCtrlEvent` function, ensuring that the return value is correctly checked across different platforms.
2025-07-03 14:13:56 +05:30
Akarshan
6ab7d37a08
fix: Update Cargo.toml dependencies on Windows & fix Ctrl+C handling on Windows
This change updates the dependencies of the Cargo.toml file on Windows to include additional features from the `windows-sys` crate. The `CreateProcess flags like CREATE_NEW_PROCESS_GROUP` feature is now enabled to allow for proper process management.
The code now properly sends Ctrl+C to the llama process on Windows, and also includes error handling for when the Ctrl+C command fails. Additionally, it now uses the `Windows` API to kill the process when it times out, and properly handles the wait for the process to exit.
2025-07-03 13:51:59 +05:30
Louis
e123d22b8d
fix: deprecate sidecar run 2025-07-02 12:48:50 +07:00
Akarshan
449bf17692
Add process aliveness check 2025-07-02 12:29:03 +07:00
Louis
9b730058b4
feat: use hardware information api 2025-07-02 12:29:02 +07:00
Louis
d264220245
fix: restrict Windows-specific code to x86_64 and update scripts
Updated Rust code to apply Windows-specific logic only on x86_64 targets using #[cfg(all(windows, target_arch = "x86_64"))]. Modified dev:tauri script in package.json to remove CLEAN=true and added CLEAN=true to beforeDevCommand in tauri.conf.json for consistency. Minor formatting changes in tauri.conf.json.
2025-07-02 12:29:02 +07:00
Akarshan
ad06b2a903
Move llama-server cleanup code to a separate file 2025-07-02 12:27:42 +07:00
Akarshan
7de694c0cd
add missing import during rebase 2025-07-02 12:27:42 +07:00
Akarshan
62ba503b86
chore: cleanup llama-server processes upon app exit 2025-07-02 12:27:42 +07:00
Akarshan
01d49a4b28
fix: Update server process handling for Windows and Unix systems 2025-07-02 12:27:42 +07:00
Akarshan
2eeabf8ae6
fix: ensure server process is properly terminated and reaped 2025-07-02 12:27:35 +07:00
Akarshan
4ffc504150
style: Rename camelCase to snake_case in llamacpp extension code
Rename variable, struct, and enum names from camelCase to snake_case throughout the llamacpp extension codebase to align with Rust naming conventions. This change improves readability and consistency without altering functionality.
2025-07-02 12:27:34 +07:00
Akarshan
6c769c5db9
feat: refactor llama server process storage to use HashMap
Change the llama_server_process state from an Option<Child> to a HashMap<String, Child> to support managing multiple server instances by PID. This allows precise process tracking and termination, replacing the previous single-process limitation.

Previously, only one server process could be tracked at a time. Now, each process is stored with its PID as the key, enabling:
- Accurate session matching during unloading
- Proper termination of specific processes
- Better error handling for mismatched PIDs

The load_llama_model function now inserts processes into the map, and unload_llama_model removes them by PID.
2025-07-02 12:27:34 +07:00
Thien Tran
8bf4a5eb7d
remove migration 2025-07-02 12:27:34 +07:00
Thien Tran
95944fa081
add Jan's library path to path 2025-07-02 12:27:17 +07:00
Thien Tran
1eb49350e9
add is_library_available command 2025-07-02 12:27:17 +07:00
Akarshan Biswas
4dfdcd68d5
refactor: rename session identifiers to pid and modelId
The changes standardize identifier names across the codebase for clarity:
- Replaced `sessionId` with `pid` to reflect process ID usage
- Changed `modelName` to `modelId` for consistency with identifier naming
- Renamed `api_key` to `apiKey` for camelCase consistency
- Updated corresponding methods to use these new identifiers
- Improved type safety and readability by aligning variable names with their semantic meaning
2025-07-02 12:27:16 +07:00
Akarshan Biswas
f9d3935269
feat: allow specifying port via command line argument
This change allows the port to be specified via command line arguments, providing flexibility. The port is parsed from the arguments, defaulting to 8080 if not provided.
2025-07-02 12:27:16 +07:00
Akarshan Biswas
5d61062b0e
feat: enhance argument parsing and add API key generation
The changes improve the robustness of command-line argument parsing in the Llama model server by replacing direct index access with safe iteration methods. A new generate_api_key function was added to handle API key generation securely. The sessionId parameter was standardized to match the renamed property in the client code.
2025-07-02 12:27:15 +07:00