211 Commits

Author SHA1 Message Date
Dinh Long Nguyen
84f46dc997
Merge branch 'dev' into feat/sync-release=to-dev 2025-09-30 22:31:20 +07:00
Louis
3c7eb64353
fix: mcp bin path (#6667)
* fix: mcp bin path

* chore: clean up unused structs

* fix: bin name

* fix: tests
2025-09-30 22:29:15 +07:00
Dinh Long Nguyen
e6bc1182a6
Merge branch 'dev' into feat/sync-release=to-dev 2025-09-30 22:04:27 +07:00
Nguyen Ngoc Minh
d315522c5a
Merge pull request #6618 from github-roushan/show-supported-files
Show supported files
2025-09-30 12:19:22 +07:00
Louis
54d17c9c72
fix: migrate new mcp server config (#6651) 2025-09-30 00:07:57 +07:00
Louis
5fd249c72d
refactor: deprecate Vulkan external binaries (#6638)
* refactor: deprecate vulkan binary

refactor: clean up vulkan lib

chore: cleanup

chore: clean up

chore: clean up

fix: build

* fix: skip binaries download env

* Update src-tauri/utils/src/system.rs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update src-tauri/utils/src/system.rs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-29 17:47:59 +07:00
Roushan Singh
7d6e0c22ac chore: fix Encoded logging 2025-09-26 15:02:23 +05:30
Faisal Amir
b7dae19756
feat: custom downloaded model name (#6588)
* feat: add field edit model name

* fix: update model

* chore: updaet UI form with save button, and handle edit capabilities and  rename folder will need save button

* fix: relocate model

* chore: update and refresh list model provider also update test case

* chore: state loader

* fix: model path

* fix: model config update

* chore: fix remove depencies provider on edit model dialog

* chore: avoid shifted model name or id

---------

Co-authored-by: Louis <louis@jan.ai>
2025-09-26 15:25:44 +07:00
Akarshan Biswas
11b3a60675
fix: refactor, fix and move gguf support utilities to backend (#6584)
* feat: move estimateKVCacheSize to BE

* feat: Migrate model planning to backend

This commit migrates the model load planning logic from the frontend to the Tauri backend. This refactors the `planModelLoad` and `isModelSupported` methods into the `tauri-plugin-llamacpp` plugin, making them directly callable from the Rust core.

The model planning now incorporates a more robust and accurate memory estimation, considering both VRAM and system RAM, and introduces a `batch_size` parameter to the model plan.

**Key changes:**

- **Moved `planModelLoad` to `tauri-plugin-llamacpp`:** The core logic for determining GPU layers, context length, and memory offloading is now in Rust for better performance and accuracy.
- **Moved `isModelSupported` to `tauri-plugin-llamacpp`:** The model support check is also now handled by the backend.
- **Removed `getChatClient` from `AIEngine`:** This optional method was not implemented and has been removed from the abstract class.
- **Improved KV Cache estimation:** The `estimate_kv_cache_internal` function in Rust now accounts for `attention.key_length` and `attention.value_length` if available, and considers sliding window attention for more precise estimates.
- **Introduced `batch_size` in ModelPlan:** The model plan now includes a `batch_size` property, which will be automatically adjusted based on the determined `ModelMode` (e.g., lower for CPU/Hybrid modes).
- **Updated `llamacpp-extension`:** The frontend extension now calls the new Tauri commands for model planning and support checks.
- **Removed `batch_size` from `llamacpp-extension/settings.json`:** The batch size is now dynamically determined by the planning logic and will be set as a model setting directly.
- **Updated `ModelSetting` and `useModelProvider` hooks:** These now handle the new `batch_size` property in model settings.
- **Added new Tauri commands and permissions:** `get_model_size`, `is_model_supported`, and `plan_model_load` are new commands with corresponding permissions.
- **Consolidated `ModelSupportStatus` and `KVCacheEstimate`:** These types are now defined in `src/tauri/plugins/tauri-plugin-llamacpp/src/gguf/types.rs`.

This refactoring centralizes critical model resource management logic, improving consistency and maintainability, and lays the groundwork for more sophisticated model loading strategies.

* feat: refine model planner to handle more memory scenarios

This commit introduces several improvements to the `plan_model_load` function, enhancing its ability to determine a suitable model loading strategy based on system memory constraints. Specifically, it includes:

-   **VRAM calculation improvements:**  Corrects the calculation of total VRAM by iterating over GPUs and multiplying by 1024*1024, improving accuracy.
-   **Hybrid plan optimization:**  Implements a more robust hybrid plan strategy, iterating through GPU layer configurations to find the highest possible GPU usage while remaining within VRAM limits.
-   **Minimum context length enforcement:** Enforces a minimum context length for the model, ensuring that the model can be loaded and used effectively.
-   **Fallback to CPU mode:** If a hybrid plan isn't feasible, it now correctly falls back to a CPU-only mode.
-   **Improved logging:** Enhanced logging to provide more detailed information about the memory planning process, including VRAM, RAM, and GPU layers.
-   **Batch size adjustment:** Updated batch size based on the selected mode, ensuring efficient utilization of available resources.
-   **Error handling and edge cases:**  Improved error handling and edge case management to prevent unexpected failures.
-   **Constants:** Added constants for easier maintenance and understanding.
-   **Power-of-2 adjustment:** Added power of 2 adjustment for max context length to ensure correct sizing for the LLM.

These changes improve the reliability and robustness of the model planning process, allowing it to handle a wider range of hardware configurations and model sizes.

* Add log for raw GPU info from tauri-plugin-hardware

* chore: update linux runner for tauri build

* feat: Improve GPU memory calculation for unified memory

This commit improves the logic for calculating usable VRAM, particularly for systems with **unified memory** like Apple Silicon. Previously, the application would report 0 total VRAM if no dedicated GPUs were found, leading to incorrect calculations and failed model loads.

This change modifies the VRAM calculation to fall back to the total system RAM if no discrete GPUs are detected. This is a common and correct approach for unified memory architectures, where the CPU and GPU share the same memory pool.

Additionally, this commit refactors the logic for calculating usable VRAM and RAM to prevent potential underflow by checking if the total memory is greater than the reserved bytes before subtracting. This ensures the calculation remains safe and correct.

* chore: fix update migration version

* fix: enable unified memory support on model support indicator

* Use total_system_memory in bytes

---------

Co-authored-by: Minh141120 <minh.itptit@gmail.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
2025-09-25 12:17:57 +05:30
Roushan Kumar Singh
3f51c35229
feat: support .zip archives for manual backend install (#6534)
* feat(llamacpp): support .zip archives for manual backend install

* Update Lock Files
2025-09-23 18:02:06 +05:30
Nghia Doan
6f827872fb
fix: Catch local API server various errors (#6548)
* fix: Catch local API server various errors

* chore: Add tests to cover error catches
2025-09-23 17:40:16 +07:00
Akarshan Biswas
d1a8bdc4e3
feat: Add Jan API server Swagger UI (#6502)
* feat: Add Jan API server Swagger UI

- Serve OpenAPI spec (`static/openapi.json`) directly from the proxy server.
- Implement Swagger UI assets (`swagger-ui.css`, `swagger-ui-bundle.js`, `favicon.ico`) and a simple HTML wrapper under `/docs`.
- Extend the proxy whitelist to include Swagger UI routes.
- Add routing logic for `/openapi.json`, `/docs`, and Swagger UI static files.
- Update whitelisted paths and integrate CORS handling for the new endpoints.

* feat: serve Swagger UI at root path

The Swagger UI endpoint previously lived under `/docs`. The route handling and
exclusion list have been updated so the UI is now served directly at `/`.
This simplifies access, aligns with the expected root URL in the Tauri
frontend, and removes the now‑unused `/docs` path handling.

* feat: add model loading state and translations for local API server

Implemented a loading indicator for model startup, updated the start/stop button to reflect model loading and server starting states, and disabled interactions while pending. Added new translation keys (`loadingModel`, `startingServer`) across all supported locales (en, de, id, pl, vn, zh-CN, zh-TW) and integrated them into the UI. Included a small delay after model start to ensure backend state consistency. This improves user feedback and prevents race conditions during server initialization.
2025-09-19 09:11:55 +05:30
Louis
86a92ead85
Merge pull request #6469 from menloresearch/fix/deeplink-not-work-on-windows
fix: deeplink issue on Windows
2025-09-18 17:47:00 +07:00
Louis
0ab6417caa
fix: left click should not show menu 2025-09-17 19:58:59 +07:00
Louis
4a4423ed6b
fix: failed tests 2025-09-17 19:55:57 +07:00
Louis
15164fc0be
feat: system tray icon build flag 2025-09-17 15:54:20 +07:00
theishangoswami
c02d8200ac added exa mcp 2025-09-16 04:38:03 +05:30
Maksym Krasovakyi
71e2e24112 Add model response timeout for local api server as configurable value via UI 2025-09-15 14:25:09 +03:00
dinhlongviolin1
e2e572ccab
refactor: moved get_short_path to utils and use it in decompress 2025-09-11 09:52:10 +05:30
Akarshan
7ac927ff02
feat: enhance llamacpp backend management and installation
- Add `src-tauri/resources/` to `.gitignore`.
- Introduced utilities to read locally installed backends (`getLocalInstalledBackends`) and fetch remote supported backends (`fetchRemoteSupportedBackends`).
- Refactored `listSupportedBackends` to merge remote and local entries with deduplication and proper sorting.
- Exported `getBackendDir` and integrated it into the extension.
- Added helper `parseBackendVersion` and new method `checkBackendForUpdates` to detect newer backend versions.
- Implemented `installBackend` for manual backend archive installation, including platform‑specific binary path handling.
- Updated command‑line argument logic for `--flash-attn` to respect version‑specific defaults.
- Modified Tauri filesystem `decompress` command to remove overly strict path validation.
2025-09-11 09:52:09 +05:30
Dinh Long Nguyen
32a2ca95b6
feat: gguf file size + hash validation (#5266) (#6259)
* feat: gguf file size + hash validation

* fix tests fe

* update cargo tests

* handle asyn download for both models and mmproj

* move progress tracker to models

* handle file download cancelled

* add cancellation mid hash run
2025-08-21 16:17:58 +07:00
Louis
6850dda108
feat: MCP server error handling 2025-08-20 23:42:12 +07:00
Louis
91f05b8f32
feat: add tool call cancellation 2025-08-19 23:27:12 +07:00
Louis
2492d6f9d0
fix: http mcp with headers 2025-08-18 09:29:46 +07:00
Louis
54e0f9b595
feat: add connection timeout setting 2025-08-15 12:45:02 +07:00
Louis
c8d9592ab8
chore: mcp group server, action and import json 2025-08-15 11:37:21 +07:00
Louis
25043dda7b
feat: MCP streamable http and sse transports 2025-08-15 10:12:41 +07:00
Louis
13a1969150
feat: MCP - State update 2025-08-15 10:02:06 +07:00
Dinh Long Nguyen
e1c8d98bf2
Backend Architecture Refactoring (#6094) (#6162)
* add llamacpp plugin

* Refactor llamacpp plugin

* add utils plugin

* remove utils folder

* add hardware implementation

* add utils folder + move utils function

* organize cargo files

* refactor utils src

* refactor util

* apply fmt

* fmt

* Update gguf + reformat

* add permission for gguf commands

* fix cargo test windows

* revert yarn lock

* remove cargo.lock for hardware plugin

* ignore cargo.lock file

* Fix hardware invoke + refactor hardware + refactor tests, constants

* use api wrapper in extension to invoke hardware call + api wrapper build integration

* add newline at EOF (per Akarshan)

* add vi mock for getSystemInfo
2025-08-15 08:59:01 +07:00
Akarshan Biswas
f4661912b0
feat: Add GGUF metadata reading functionality (#6120)
* feat: Add GGUF metadata reading functionality

This commit introduces a new Tauri command and a corresponding function to read metadata from GGUF model files.

The new read_gguf_metadata command in the Rust backend uses the byteorder crate to parse the GGUF file format and extract key metadata. This information, including the file's version, tensor count, and a key-value map of other metadata, is then made available to the TypeScript frontend.

This functionality is a foundational step toward providing users with more detailed information about their loaded models directly within the application.

This will be refactored later.

fixes: #6001

* loadMetadata() should return

* Properly throw eror to FE

* Use BufReader to improve performance
2025-08-13 22:54:20 +05:30
Louis
9ed98614fe fix: factory reset process got blocked 2025-08-11 19:42:59 +07:00
Louis
f3dd26e499
fix: uvx and npx dirs should be not be relocated 2025-08-11 14:33:58 +07:00
Louis
b924156a15
fix: bring back GPU detection 2025-08-11 13:52:20 +07:00
Louis
4f5d9b8222
Merge pull request #6089 from menloresearch/fix/clean-up-unused-apis
refactor: clean up unused hardware apis
2025-08-11 00:02:31 +07:00
Louis
59afafba0e fix: test command 2025-08-10 23:36:14 +07:00
Louis
f0a9080ef7 fix: cargo test on windows 2025-08-10 22:46:44 +07:00
Akarshan Biswas
0cfc745954
feat: Introduce structured error handling for llamacpp extension (#6087)
* feat: Introduce structured error handling for llamacpp extension

This commit introduces a structured error handling system for the `llamacpp` extension. Instead of returning simple string errors, we now use a custom `LlamacppError` struct with a specific `ErrorCode` enum. This allows the frontend to display more user-friendly and actionable error messages based on the code, rather than raw debug logs.

The changes include:
- A new `ErrorCode` enum to categorize errors (e.g., `OutOfMemory`, `ModelArchNotSupported`, `BinaryNotFound`).
- A `LlamacppError` struct to encapsulate the code, a user-facing message, and optional detailed logs.
- A static method `from_stderr` that intelligently parses llama.cpp's standard error output to identify and map common issues like Out of Memory errors to a specific error code.
- Refactored `ServerError` enum to wrap the new `LlamacppError` and provide a consistent serialization format for the Tauri frontend.
- Updated all relevant functions (`load_llama_model`, `get_devices`) to return the new structured error type, ensuring a more robust and predictable error flow.
- A reduced timeout for model loading from 300 to 180 seconds.

This work lays the groundwork for a more intuitive and helpful user experience, as the application can now provide clear guidance to users when a model fails to load.

* Update src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* Update src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* chore: update FE handle error object from extension

* chore: fix property type

---------

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
2025-08-07 23:28:25 +05:30
Louis
fc7d8a7a9c
fix: test 2025-08-07 23:47:51 +07:00
Louis
9285714345
fix: tests 2025-08-07 22:38:28 +07:00
Akarshan
bdec0af791
fix windows test 2025-08-07 20:37:33 +05:30
Akarshan
9482c0a6b9
Revert "fix import on Windows"
This reverts commit b0e7030939a82baec5f12c44639d0eb6c3c1cf43.
2025-08-07 20:35:13 +05:30
Akarshan
b0e7030939
fix import on Windows 2025-08-07 20:29:05 +05:30
Akarshan
dc82fd6051
fix windows test for short path 2025-08-07 20:16:43 +05:30
Louis
b8f5fd510a
test: fix failed tests 2025-08-07 20:54:00 +07:00
Louis
c1668a4e4a
refactor: clean up unused hardware apis 2025-08-07 20:04:23 +07:00
Akarshan Biswas
469d787888
refactor: Use more precise terminology in API server logs (#6085)
* refactor: Use more precise terminology in API server logs and error messages

This commit refactors several log and error messages to use more accurate and consistent terminology.

-   Replaced "backend servers" and "backend model servers" with "models" or "sessions" to better reflect the service's internal structure.
-   Changed "Proxy server" to "Jan API server" to more accurately describe the server's function.
-   Removed a redundant debug log message.

These changes are cosmetic and improve the readability and consistency of the logging output.

* Update src-tauri/src/core/server.rs

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

---------

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
2025-08-07 17:48:33 +05:30
Akarshan Biswas
6a699d8004
refactor: move session management & port allocation to backend (#6083)
* refactor: move session management & port allocation to backend

- Remove the in‑process `activeSessions` map and its cleanup logic from the TypeScript side.
- Introduce new Tauri commands in Rust:
  - `get_random_port` – picks an unused port using a seeded RNG and checks availability.
  - `find_session_by_model` – returns the `SessionInfo` for a given model ID.
  - `get_loaded_models` – returns a list of currently loaded model IDs.
- Update the extension’s TypeScript code to use these commands via `invoke`:
  - `findSessionByModel`, `load`, `unload`, `chat`, `getLoadedModels`, and `embed` now operate asynchronously and query the backend.
  - Remove the old `is_port_available` command and the custom port‑checking loop.
  - Simplify `onUnload` – session termination is now handled by the backend.
- Drop unused helpers (`sleep`, `waitForModelLoad`) and related port‑availability code.
- Add missing Rust imports (`rand::{StdRng,Rng,SeedableRng}`, `HashSet`) and improve error handling.
- Register the new commands in `src-tauri/src/lib.rs` (replace `is_port_available` with the three new commands).

This refactor centralises session state and port allocation in the Rust backend, eliminates duplicated logic, and resolves race conditions around model loading and session cleanup.

* Use String(e) for error

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

---------

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
2025-08-07 13:06:21 +05:30
Louis
e74601443f chore: add deep_link register_all 2025-08-06 12:24:21 +10:00
Akarshan Biswas
dcffa4fa0a Fix: Improve Llama.cpp model path handling and error handling (#6045)
* Improve Llama.cpp model path handling and validation

This commit refactors the load_llama_model function to improve how it handles and validates the model path.

Previously, the function extracted the model path but did not perform any validation. This change adds the following improvements:

It now checks for the presence of the -m flag.

It verifies that a path is provided after the -m flag.

It validates that the specified model path actually exists on the filesystem.

It ensures that the SessionInfo struct stores the canonical display path of the model, which is a more robust approach.

These changes make the model loading process more reliable and provide better error handling for invalid or missing model paths.

* Exp: Use short path on Windows

* Fix: Remove error channel and handling in llama.cpp server loading

The previous implementation used a channel to receive error messages from the llama.cpp server's stdout. However, this proved unreliable as the path names can contain 'errors strings' that we use to check even during normal operation. This commit removes the error channel and associated error handling logic.
The server readiness is still determined by checking for the "server is listening" message in stdout. Errors are now handled by relying on the process exit code and capturing the full stderr output if the process fails to start or exits unexpectedly. This approach provides a more robust and accurate error detection mechanism.

* Add else block in Windows path handling

* Add some path related tests

* Fix windows tests
2025-08-06 12:24:21 +10:00
Louis
eb13189d07 fix: run dev should reinstall extensions 2025-08-06 12:24:21 +10:00