* feat: Add Jan API server Swagger UI
- Serve OpenAPI spec (`static/openapi.json`) directly from the proxy server.
- Implement Swagger UI assets (`swagger-ui.css`, `swagger-ui-bundle.js`, `favicon.ico`) and a simple HTML wrapper under `/docs`.
- Extend the proxy whitelist to include Swagger UI routes.
- Add routing logic for `/openapi.json`, `/docs`, and Swagger UI static files.
- Update whitelisted paths and integrate CORS handling for the new endpoints.
* feat: serve Swagger UI at root path
The Swagger UI endpoint previously lived under `/docs`. The route handling and
exclusion list have been updated so the UI is now served directly at `/`.
This simplifies access, aligns with the expected root URL in the Tauri
frontend, and removes the now‑unused `/docs` path handling.
* feat: add model loading state and translations for local API server
Implemented a loading indicator for model startup, updated the start/stop button to reflect model loading and server starting states, and disabled interactions while pending. Added new translation keys (`loadingModel`, `startingServer`) across all supported locales (en, de, id, pl, vn, zh-CN, zh-TW) and integrated them into the UI. Included a small delay after model start to ensure backend state consistency. This improves user feedback and prevents race conditions during server initialization.
- Add `src-tauri/resources/` to `.gitignore`.
- Introduced utilities to read locally installed backends (`getLocalInstalledBackends`) and fetch remote supported backends (`fetchRemoteSupportedBackends`).
- Refactored `listSupportedBackends` to merge remote and local entries with deduplication and proper sorting.
- Exported `getBackendDir` and integrated it into the extension.
- Added helper `parseBackendVersion` and new method `checkBackendForUpdates` to detect newer backend versions.
- Implemented `installBackend` for manual backend archive installation, including platform‑specific binary path handling.
- Updated command‑line argument logic for `--flash-attn` to respect version‑specific defaults.
- Modified Tauri filesystem `decompress` command to remove overly strict path validation.
* feat: Add GGUF metadata reading functionality
This commit introduces a new Tauri command and a corresponding function to read metadata from GGUF model files.
The new read_gguf_metadata command in the Rust backend uses the byteorder crate to parse the GGUF file format and extract key metadata. This information, including the file's version, tensor count, and a key-value map of other metadata, is then made available to the TypeScript frontend.
This functionality is a foundational step toward providing users with more detailed information about their loaded models directly within the application.
This will be refactored later.
fixes: #6001
* loadMetadata() should return
* Properly throw eror to FE
* Use BufReader to improve performance
* feat: Introduce structured error handling for llamacpp extension
This commit introduces a structured error handling system for the `llamacpp` extension. Instead of returning simple string errors, we now use a custom `LlamacppError` struct with a specific `ErrorCode` enum. This allows the frontend to display more user-friendly and actionable error messages based on the code, rather than raw debug logs.
The changes include:
- A new `ErrorCode` enum to categorize errors (e.g., `OutOfMemory`, `ModelArchNotSupported`, `BinaryNotFound`).
- A `LlamacppError` struct to encapsulate the code, a user-facing message, and optional detailed logs.
- A static method `from_stderr` that intelligently parses llama.cpp's standard error output to identify and map common issues like Out of Memory errors to a specific error code.
- Refactored `ServerError` enum to wrap the new `LlamacppError` and provide a consistent serialization format for the Tauri frontend.
- Updated all relevant functions (`load_llama_model`, `get_devices`) to return the new structured error type, ensuring a more robust and predictable error flow.
- A reduced timeout for model loading from 300 to 180 seconds.
This work lays the groundwork for a more intuitive and helpful user experience, as the application can now provide clear guidance to users when a model fails to load.
* Update src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* Update src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* chore: update FE handle error object from extension
* chore: fix property type
---------
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
* refactor: Use more precise terminology in API server logs and error messages
This commit refactors several log and error messages to use more accurate and consistent terminology.
- Replaced "backend servers" and "backend model servers" with "models" or "sessions" to better reflect the service's internal structure.
- Changed "Proxy server" to "Jan API server" to more accurately describe the server's function.
- Removed a redundant debug log message.
These changes are cosmetic and improve the readability and consistency of the logging output.
* Update src-tauri/src/core/server.rs
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
---------
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* refactor: move session management & port allocation to backend
- Remove the in‑process `activeSessions` map and its cleanup logic from the TypeScript side.
- Introduce new Tauri commands in Rust:
- `get_random_port` – picks an unused port using a seeded RNG and checks availability.
- `find_session_by_model` – returns the `SessionInfo` for a given model ID.
- `get_loaded_models` – returns a list of currently loaded model IDs.
- Update the extension’s TypeScript code to use these commands via `invoke`:
- `findSessionByModel`, `load`, `unload`, `chat`, `getLoadedModels`, and `embed` now operate asynchronously and query the backend.
- Remove the old `is_port_available` command and the custom port‑checking loop.
- Simplify `onUnload` – session termination is now handled by the backend.
- Drop unused helpers (`sleep`, `waitForModelLoad`) and related port‑availability code.
- Add missing Rust imports (`rand::{StdRng,Rng,SeedableRng}`, `HashSet`) and improve error handling.
- Register the new commands in `src-tauri/src/lib.rs` (replace `is_port_available` with the three new commands).
This refactor centralises session state and port allocation in the Rust backend, eliminates duplicated logic, and resolves race conditions around model loading and session cleanup.
* Use String(e) for error
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
---------
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* Improve Llama.cpp model path handling and validation
This commit refactors the load_llama_model function to improve how it handles and validates the model path.
Previously, the function extracted the model path but did not perform any validation. This change adds the following improvements:
It now checks for the presence of the -m flag.
It verifies that a path is provided after the -m flag.
It validates that the specified model path actually exists on the filesystem.
It ensures that the SessionInfo struct stores the canonical display path of the model, which is a more robust approach.
These changes make the model loading process more reliable and provide better error handling for invalid or missing model paths.
* Exp: Use short path on Windows
* Fix: Remove error channel and handling in llama.cpp server loading
The previous implementation used a channel to receive error messages from the llama.cpp server's stdout. However, this proved unreliable as the path names can contain 'errors strings' that we use to check even during normal operation. This commit removes the error channel and associated error handling logic.
The server readiness is still determined by checking for the "server is listening" message in stdout. Errors are now handled by relying on the process exit code and capturing the full stderr output if the process fails to start or exits unexpectedly. This approach provides a more robust and accurate error detection mechanism.
* Add else block in Windows path handling
* Add some path related tests
* Fix windows tests
* feat: Improve llama.cpp argument handling and add device parsing tests
This commit refactors how arguments are passed to llama.cpp,
specifically by only adding arguments when their values differ from
their defaults. This reduces the verbosity of the command and prevents
potential conflicts or errors when llama.cpp's default behavior aligns
with the desired setting.
Additionally, new tests have been added for parsing device output from
llama.cpp, ensuring the accurate extraction of GPU information (ID,
name, total memory, and free memory). This improves the robustness of
device detection.
The following changes were made:
* **Remove redundant `--ctx-size` argument:** The `--ctx-size`
argument is now only explicitly added if `cfg.ctx_size` is greater
than 0.
* **Conditional argument adding for default values:**
* `--split-mode` is only added if `cfg.split_mode` is not empty
and not 'layer'.
* `--main-gpu` is only added if `cfg.main_gpu` is not undefined
and not 0.
* `--cache-type-k` is only added if `cfg.cache_type_k` is not 'f16'.
* `--cache-type-v` is only added if `cfg.cache_type_v` is not 'f16'
(when `flash_attn` is enabled) or not 'f32' (otherwise). This
also corrects the `flash_attn` condition.
* `--defrag-thold` is only added if `cfg.defrag_thold` is not 0.1.
* `--rope-scaling` is only added if `cfg.rope_scaling` is not
'none'.
* `--rope-scale` is only added if `cfg.rope_scale` is not 1.
* `--rope-freq-base` is only added if `cfg.rope_freq_base` is not 0.
* `--rope-freq-scale` is only added if `cfg.rope_freq_scale` is
not 1.
* **Add `parse_device_output` tests:** Comprehensive unit tests were
added to `src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs`
to validate the parsing of llama.cpp device output under various
scenarios, including multiple devices, single devices, different
backends (CUDA, Vulkan, SYCL), complex GPU names, and error
conditions.
* fixup cache_type_v comparision
* Fix: Llama.cpp server hangs on model load
Resolves an issue where the llama.cpp server would hang indefinitely when loading certain models, as described in the attached ticket. The server's readiness message was not being correctly detected, causing the application to stall.
The previous implementation used a line-buffered reader (BufReader::lines()) to process the stderr stream. This method proved to be unreliable for the specific output of the llama.cpp server.
This commit refactors the stderr handling logic to use a more robust, chunk-based approach (read_until(b'\n', ...)). This ensures that the output is processed as it arrives, reliably capturing critical status messages and preventing the application from hanging during model initialization.
Fixes: #6021
* Handle error gracefully with ServerError
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* Revert "Handle error gracefully with ServerError"
This reverts commit 267a8a8a3262fbe36a445a30b8b3ba9a39697643.
* Revert "Fix: Llama.cpp server hangs on model load"
This reverts commit 44e5447f82f0ae32b6db7ffb213025f130d655c4.
* Add more guards, refactor and fix error sending to FE
---------
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>