842 Commits

Author SHA1 Message Date
Faisal Amir
b7dae19756
feat: custom downloaded model name (#6588)
* feat: add field edit model name

* fix: update model

* chore: updaet UI form with save button, and handle edit capabilities and  rename folder will need save button

* fix: relocate model

* chore: update and refresh list model provider also update test case

* chore: state loader

* fix: model path

* fix: model config update

* chore: fix remove depencies provider on edit model dialog

* chore: avoid shifted model name or id

---------

Co-authored-by: Louis <louis@jan.ai>
2025-09-26 15:25:44 +07:00
Louis
55c42ba526
fix: lock all of the dependencies (#6561)
* fix: pin web app dependencies

* fix: pin extension versions

* fix: pin extensions-web dependencies

* fix: pin extensions lockfile

* fix: remove unnecessary semicolon
2025-09-26 13:07:29 +07:00
Akarshan Biswas
11b3a60675
fix: refactor, fix and move gguf support utilities to backend (#6584)
* feat: move estimateKVCacheSize to BE

* feat: Migrate model planning to backend

This commit migrates the model load planning logic from the frontend to the Tauri backend. This refactors the `planModelLoad` and `isModelSupported` methods into the `tauri-plugin-llamacpp` plugin, making them directly callable from the Rust core.

The model planning now incorporates a more robust and accurate memory estimation, considering both VRAM and system RAM, and introduces a `batch_size` parameter to the model plan.

**Key changes:**

- **Moved `planModelLoad` to `tauri-plugin-llamacpp`:** The core logic for determining GPU layers, context length, and memory offloading is now in Rust for better performance and accuracy.
- **Moved `isModelSupported` to `tauri-plugin-llamacpp`:** The model support check is also now handled by the backend.
- **Removed `getChatClient` from `AIEngine`:** This optional method was not implemented and has been removed from the abstract class.
- **Improved KV Cache estimation:** The `estimate_kv_cache_internal` function in Rust now accounts for `attention.key_length` and `attention.value_length` if available, and considers sliding window attention for more precise estimates.
- **Introduced `batch_size` in ModelPlan:** The model plan now includes a `batch_size` property, which will be automatically adjusted based on the determined `ModelMode` (e.g., lower for CPU/Hybrid modes).
- **Updated `llamacpp-extension`:** The frontend extension now calls the new Tauri commands for model planning and support checks.
- **Removed `batch_size` from `llamacpp-extension/settings.json`:** The batch size is now dynamically determined by the planning logic and will be set as a model setting directly.
- **Updated `ModelSetting` and `useModelProvider` hooks:** These now handle the new `batch_size` property in model settings.
- **Added new Tauri commands and permissions:** `get_model_size`, `is_model_supported`, and `plan_model_load` are new commands with corresponding permissions.
- **Consolidated `ModelSupportStatus` and `KVCacheEstimate`:** These types are now defined in `src/tauri/plugins/tauri-plugin-llamacpp/src/gguf/types.rs`.

This refactoring centralizes critical model resource management logic, improving consistency and maintainability, and lays the groundwork for more sophisticated model loading strategies.

* feat: refine model planner to handle more memory scenarios

This commit introduces several improvements to the `plan_model_load` function, enhancing its ability to determine a suitable model loading strategy based on system memory constraints. Specifically, it includes:

-   **VRAM calculation improvements:**  Corrects the calculation of total VRAM by iterating over GPUs and multiplying by 1024*1024, improving accuracy.
-   **Hybrid plan optimization:**  Implements a more robust hybrid plan strategy, iterating through GPU layer configurations to find the highest possible GPU usage while remaining within VRAM limits.
-   **Minimum context length enforcement:** Enforces a minimum context length for the model, ensuring that the model can be loaded and used effectively.
-   **Fallback to CPU mode:** If a hybrid plan isn't feasible, it now correctly falls back to a CPU-only mode.
-   **Improved logging:** Enhanced logging to provide more detailed information about the memory planning process, including VRAM, RAM, and GPU layers.
-   **Batch size adjustment:** Updated batch size based on the selected mode, ensuring efficient utilization of available resources.
-   **Error handling and edge cases:**  Improved error handling and edge case management to prevent unexpected failures.
-   **Constants:** Added constants for easier maintenance and understanding.
-   **Power-of-2 adjustment:** Added power of 2 adjustment for max context length to ensure correct sizing for the LLM.

These changes improve the reliability and robustness of the model planning process, allowing it to handle a wider range of hardware configurations and model sizes.

* Add log for raw GPU info from tauri-plugin-hardware

* chore: update linux runner for tauri build

* feat: Improve GPU memory calculation for unified memory

This commit improves the logic for calculating usable VRAM, particularly for systems with **unified memory** like Apple Silicon. Previously, the application would report 0 total VRAM if no dedicated GPUs were found, leading to incorrect calculations and failed model loads.

This change modifies the VRAM calculation to fall back to the total system RAM if no discrete GPUs are detected. This is a common and correct approach for unified memory architectures, where the CPU and GPU share the same memory pool.

Additionally, this commit refactors the logic for calculating usable VRAM and RAM to prevent potential underflow by checking if the total memory is greater than the reserved bytes before subtracting. This ensures the calculation remains safe and correct.

* chore: fix update migration version

* fix: enable unified memory support on model support indicator

* Use total_system_memory in bytes

---------

Co-authored-by: Minh141120 <minh.itptit@gmail.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
2025-09-25 12:17:57 +05:30
Louis
7fe58d6bee
fix: allow users cancel backend download (#6582)
* fix: allow users cancel backend download

* fix: should not redownload on cancel
2025-09-25 13:19:14 +07:00
Louis
57110d2bd7
fix: allow users to download the same model from different authors (#6577)
* fix: allow users to download the same model from different authors

* fix: importing models should have author name in the ID

* fix: incorrect model id show

* fix: tests

* fix: default to mmproj f16 instead of bf16

* fix: type

* fix: build error
2025-09-24 17:57:10 +07:00
Roushan Kumar Singh
3f51c35229
feat: support .zip archives for manual backend install (#6534)
* feat(llamacpp): support .zip archives for manual backend install

* Update Lock Files
2025-09-23 18:02:06 +05:30
Akarshan Biswas
885da29f28
feat: add getTokensCount method to compute token usage (#6467)
* feat: add getTokensCount method to compute token usage

Implemented a new async `getTokensCount` function in the LLaMA.cpp extension.
The method validates the model session, checks process health, applies the request template, and tokenizes the resulting prompt to return the token count. Includes detailed error handling for crashed models and API failures, enabling callers to assess token usage before sending completions.

* Fix: typos

* chore: update ui token usage

* chore: remove unused code

* feat: add image token handling for multimodal LlamaCPP models

Implemented support for counting image tokens when using vision-enabled models:
- Extended `SessionInfo` with optional `mmprojPath` to store the multimodal project file.
- Propagated `mmproj_path` from the Tauri plugin into the session info.
- Added import of `chatCompletionRequestMessage` and enhanced token calculation logic in the LlamaCPP extension:
- Detects image content in messages.
- Reads GGUF metadata from `mmprojPath` to compute accurate image token counts.
- Provides a fallback estimation if metadata reading fails.
- Returns the sum of text and image tokens.
- Introduced helper methods `calculateImageTokens` and `estimateImageTokensFallback`.
- Minor clean‑ups such as comment capitalization and debug logging.

* chore: update FE send params message include content type image_url

* fix mmproj path from session info and num tokens calculation

* fix: Correct image token estimation calculation in llamacpp extension

This commit addresses an inaccurate token count for images in the llama.cpp extension.

The previous logic incorrectly calculated the token count based on image patch size and dimensions. This has been replaced with a more precise method that uses the clip.vision.projection_dim value from the model metadata.

Additionally, unnecessary debug logging was removed, and a new log was added to show the mmproj metadata for improved visibility.

* fix per image calc

* fix: crash due to force unwrap

---------

Co-authored-by: Faisal Amir <urmauur@gmail.com>
Co-authored-by: Louis <louis@jan.ai>
2025-09-23 07:52:19 +05:30
Akarshan Biswas
bf7f176741
feat: Prompt progress when streaming (#6503)
* feat: Prompt progress when streaming

- BE changes:
    - Add a `return_progress` flag to `chatCompletionRequest` and a corresponding `prompt_progress` payload in `chatCompletionChunk`. Introduce `chatCompletionPromptProgress` interface to capture cache, processed, time, and total token counts.
    - Update the Llamacpp extension to always request progress data when streaming, enabling UI components to display real‑time generation progress and leverage llama.cpp’s built‑in progress reporting.

* Make return_progress optional

* chore: update ui prompt progress before streaming content

* chore: remove log

* chore: remove progress when percentage >= 100

* chore: set timeout prompt progress

* chore: move prompt progress outside streaming content

* fix: tests

---------

Co-authored-by: Faisal Amir <urmauur@gmail.com>
Co-authored-by: Louis <louis@jan.ai>
2025-09-22 20:37:27 +05:30
Dinh Long Nguyen
645548e931
Merge pull request #6516 from menloresearch/release/v0.6.10 2025-09-18 19:15:54 +07:00
Akarshan Biswas
b9f658f2ae
fix: correct memory suitability checks in llamacpp extension (#6504)
The previous implementation mixed model size and VRAM checks, leading to inaccurate status reporting (e.g., false RED results).
- Simplified import statement for `readGgufMetadata`.
- Fixed RAM/VRAM comparison by removing unnecessary parentheses.
- Replaced ambiguous `modelSize > usableTotalMemory` check with a clear `totalRequired > usableTotalMemory` hard‑limit condition.
- Refactored the status logic to explicitly handle the CPU‑GPU hybrid scenario, returning **YELLOW** when the total requirement fits combined memory but exceeds VRAM.
- Updated comments for better readability and maintenance.
2025-09-17 20:09:33 +05:30
Louis
15164fc0be
feat: system tray icon build flag 2025-09-17 15:54:20 +07:00
Akarshan Biswas
9e3a77a559
fix: set default memory mode and clean up unused import (#6463)
Use fallback value 'high' for memory_util config and remove unused GgufMetadata import.
2025-09-15 19:00:46 +05:30
Akarshan Biswas
489c5a3d9c
fix: KVCache size calculation and refactor (#6438)
- Removed the unused `getKVCachePerToken` helper and replaced it with a unified `estimateKVCache` that returns both total size and per‑token size.
- Fixed the KV cache size calculation to account for all layers, correcting previous under‑estimation.
- Added proper clamping of user‑requested context lengths to the model’s maximum.
- Refactored VRAM budgeting: introduced explicit reserves, fixed engine overhead, and separate multipliers for VRAM and system RAM based on memory mode.
- Implemented a more robust planning flow with clear GPU, Hybrid, and CPU pathways, including fallback configurations when resources are insufficient.
- Updated default context length handling and safety buffers to prevent OOM situations.
- Adjusted usable memory percentage to 90 % and refined logging for easier debugging.
2025-09-15 10:16:13 +05:30
Akarshan Biswas
654e566dcb
fix: correct context shift flag handling in LlamaCPP extension (#6404) (#6431)
* fix: correct context shift flag handling in LlamaCPP extension

The previous implementation added the `--no-context-shift` flag when `cfg.ctx_shift` was disabled, which conflicted with the llama.cpp CLI where the presence of `--context-shift` enables the feature.
The logic is updated to push `--context-shift` only when `cfg.ctx_shift` is true, ensuring the extension passes the correct argument and behaves as expected.

* feat: detect model out of context during generation

---------

Co-authored-by: Dinh Long Nguyen <dinhlongviolin1@gmail.com>
2025-09-12 13:43:31 +05:30
Dinh Long Nguyen
b5b6e1dc19
add mcp for web (#6411)
* add mcp for web

* update /jan/v1 endpoint to /v1

* update mise and makefile

* update yarn lock

* use mcp oauth properly
2025-09-12 12:14:10 +07:00
Faisal Amir
e709d200aa
Merge pull request #6416 from menloresearch/enhancement/experimental-label
enhancement: add label experimental for optimize setting
2025-09-11 16:12:35 +07:00
Akarshan
7c41408a1a
feat: add relative path support for model loading
Implemented `isAbsolutePath` helper to correctly identify POSIX, Windows drive‑letter, and UNC absolute paths. Updated `planModelLoad` to automatically resolve relative model and mmproj paths against the Jan data folder, enhancing usability for users supplying non‑absolute paths. Also refined minor formatting for readability.
2025-09-11 13:45:29 +05:30
Akarshan
abd0cbe599
refactor: rename noOffloadMmproj flag to offloadMmproj and reorder args
The flag `noOffloadMmproj` was misleading – it actually indicates when the mmproj file **is** offloaded to VRAM. Renaming it to `offloadMmproj` clarifies its purpose and aligns the naming with the surrounding code.

Additionally, the `planModelLoad` signature has been reordered to place `mmprojPath` before `requestedCtx`, improving readability and making the optional parameters more intuitive. All related logic, calculations, and log messages have been updated to use the new flag name.
2025-09-11 12:29:53 +05:30
Louis
7fea6e1ab0
fix: clean up unused packages (#6414) 2025-09-11 13:16:26 +07:00
Akarshan Biswas
5ff7935d91
fix: include lm_head and embedding layers in totalLayers count (#6415)
The original calculation used only the `block_count` from the model metadata, which excludes the final LM head and the embedding layer. This caused an underestimation of the total number of layers and consequently an incorrect `layerSize` value. Adding `+2` accounts for these two missing layers, ensuring accurate model size metrics.
2025-09-11 11:40:39 +05:30
Akarshan
13806a3f06
Fixup sorting in determineBestBackend 2025-09-11 09:56:46 +05:30
Akarshan Biswas
3cd099ee87
Update extensions/llamacpp-extension/src/index.ts
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
2025-09-11 09:55:57 +05:30
Akarshan
42411b5f33
feat: prioritize Vulkan backend only when GPU has ≥6 GB VRAM
Added a GPU memory check using `getSystemInfo` to ensure Vulkan is selected only on systems with at least 6 GB of VRAM.
* Made `determineBestBackend` asynchronous and updated all callers to `await` it.
* Adjusted backend priority list to include or demote Vulkan based on the memory check.
* Updated Vulkan support detection in `backend.ts` to rely solely on API version (memory check moved to selection logic).
* Imported `getSystemInfo` and refined file‑existence validation.

These changes prevent sub‑optimal Vulkan usage on low‑memory GPUs and improve backend selection reliability.
2025-09-11 09:55:55 +05:30
Akarshan
84874c6039
fix file condition 2025-09-11 09:55:08 +05:30
Akarshan
0eff1bfaa9
Throw error when invalid file 2025-09-11 09:55:08 +05:30
Akarshan
5ef9d8dfc3
Add debug logs and refactor 2025-09-11 09:55:06 +05:30
Akarshan
2e350ab607
Refresh list of backends by calling configureBackends() and some refactoring in installBackend 2025-09-11 09:52:09 +05:30
Faisal Amir
ba4dc6d1eb
enhancement: update ui dialog update llamacpp backend 2025-09-11 09:52:09 +05:30
Akarshan
a6e4f28830
Add guard before checking locally installed backends 2025-09-11 09:52:09 +05:30
Akarshan
4e37c361c4
feat: expose new updateBackend function for manually updating backend 2025-09-11 09:52:09 +05:30
Akarshan
7ac927ff02
feat: enhance llamacpp backend management and installation
- Add `src-tauri/resources/` to `.gitignore`.
- Introduced utilities to read locally installed backends (`getLocalInstalledBackends`) and fetch remote supported backends (`fetchRemoteSupportedBackends`).
- Refactored `listSupportedBackends` to merge remote and local entries with deduplication and proper sorting.
- Exported `getBackendDir` and integrated it into the extension.
- Added helper `parseBackendVersion` and new method `checkBackendForUpdates` to detect newer backend versions.
- Implemented `installBackend` for manual backend archive installation, including platform‑specific binary path handling.
- Updated command‑line argument logic for `--flash-attn` to respect version‑specific defaults.
- Modified Tauri filesystem `decompress` command to remove overly strict path validation.
2025-09-11 09:52:09 +05:30
Akarshan Biswas
7a174e621a
feat: Smart model management (#6390)
* feat: Smart model management

* **New UI option** – `memory_util` added to `settings.json` with a dropdown (high / medium / low) to let users control how aggressively the engine uses system memory.
* **Configuration updates** – `LlamacppConfig` now includes `memory_util`; the extension class stores it in a new `memoryMode` property and handles updates through `updateConfig`.
* **System memory handling**
  * Introduced `SystemMemory` interface and `getTotalSystemMemory()` to report combined VRAM + RAM.
  * Added helper methods `getKVCachePerToken`, `getLayerSize`, and a new `ModelPlan` type.
* **Smart model‑load planner** – `planModelLoad()` computes:
  * Number of GPU layers that can fit in usable VRAM.
  * Maximum context length based on KV‑cache size and the selected memory utilization mode (high/medium/low).
  * Whether KV‑cache must be off‑loaded to CPU and the overall loading mode (GPU, Hybrid, CPU, Unsupported).
  * Detailed logging of the planning decision.
* **Improved support check** – `isModelSupported()` now:
  * Uses the combined VRAM/RAM totals from `getTotalSystemMemory()`.
  * Applies an 80% usable‑memory heuristic.
  * Returns **GREEN** only when both weights and KV‑cache fit in VRAM, **YELLOW** when they fit only in total memory or require CPU off‑load, and **RED** when the model cannot fit at all.
* **Cleanup** – Removed unused `GgufMetadata` import; updated imports and type definitions accordingly.
* **Documentation/comments** – Added explanatory JSDoc comments for the new methods and clarified the return semantics of `isModelSupported`.

* chore: migrate no_kv_offload from llamacpp setting to model setting

* chore: add UI auto optimize model setting

* feat: improve model loading planner with mmproj support and smarter memory budgeting

* Extend `ModelPlan` with optional `noOffloadMmproj` flag to indicate when a multimodal projector can stay in VRAM.
* Add `mmprojPath` parameter to `planModelLoad` and calculate its size, attempting to keep it on GPU when possible.
* Refactor system memory detection:
  * Use `used_memory` (actual free RAM) instead of total RAM for budgeting.
  * Introduced `usableRAM` placeholder for future use.
* Rewrite KV‑cache size calculation:
  * Properly handle GQA models via `attention.head_count_kv`.
  * Compute bytes per token as `nHeadKV * headDim * 2 * 2 * nLayer`.
* Replace the old 70 % VRAM heuristic with a more flexible budget:
  * Reserve a fixed VRAM amount and apply an overhead factor.
  * Derive usable system RAM from total memory minus VRAM.
* Implement a robust allocation algorithm:
  * Prioritize placing the mmproj in VRAM.
  * Search for the best balance of GPU layers and context length.
  * Fallback strategies for hybrid and pure‑CPU modes with detailed safety checks.
* Add extensive validation of model size, KV‑cache size, layer size, and memory mode.
* Improve logging throughout the planning process for easier debugging.
* Adjust final plan return shape to include the new `noOffloadMmproj` field.

* remove unused variable

---------

Co-authored-by: Faisal Amir <urmauur@gmail.com>
2025-09-11 09:48:03 +05:30
Faisal Amir
be851ebcf1 chore: validate gguf file base metadata architecture 2025-09-08 20:16:20 +07:00
Faisal Amir
836990b7d9 chore: update fn check mmproj file 2025-09-08 11:10:00 +07:00
Dinh Long Nguyen
a30eb7f968
feat: Jan Web (reusing Jan Desktop UI) (#6298)
* add platform guards

* add service management

* fix types

* move to zustand for servicehub

* update App Updater

* update tauri missing move

* update app updater

* refactor: move PlatformFeatures to separate const file

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* change tauri fetch name

* update implementation

* update extension fetch

* make web version run properly

* disabled unused web settings

* fix all tests

* fix lint

* fix tests

* add mock for extension

* fix build

* update make and mise

* fix tsconfig for web-extensions

* fix loader type

* cleanup

* fix test

* update error handling + mcp should be working

* Update mcp init

* use separate is_web_app build property

* Remove fixed model catalog url

* fix additional tests

* fix download issue (event emitter not implemented correctly)

* Update Title html

* fix app logs

* update root tsx render timing

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-05 01:47:46 +07:00
hiento09
1b74772d07
feat: download llamacpp backend fail back to cdn (#6361)
* feat: download llamacpp backend fail back to cdn incase github api encounters errors
2025-09-04 09:39:16 +07:00
Akarshan Biswas
5fae954ac5
fix: Use 80% total memory for compatibility check (#6321)
* fix: Use 80% total memory for compatibility check

* refactor: extract usable memory percentage to named constant

Extract the hardcoded 0.8 multiplier into a named constant
USABLE_MEMORY_PERCENTAGE for better readability and maintainability.
2025-08-28 14:50:00 +05:30
Akarshan Biswas
64a608039b
fix: check for env value before setting (#6266)
* fix: check for env value before setting

* Use empty instead of none
2025-08-21 22:55:49 +05:30
Akarshan Biswas
510c70bdf7
feat: Add model compatibility check and memory estimation (#6243)
* feat: Add model compatibility check and memory estimation

This commit introduces a new feature to check if a given model is supported based on available device memory.

The change includes:
- A new `estimateKVCache` method that calculates the required memory for the model's KV cache. It uses GGUF metadata such as `block_count`, `head_count`, `key_length`, and `value_length` to perform the calculation.
- An `isModelSupported` method that combines the model file size and the estimated KV cache size to determine the total memory required. It then checks if any available device has sufficient free memory to load the model.
- An updated error message for the `version_backend` check to be more user-friendly, suggesting a stable internet connection as a potential solution for backend setup failures.

This functionality helps prevent the application from attempting to load models that would exceed the device's memory capacity, leading to more stable and predictable behavior.

fixes: #5505

* Update extensions/llamacpp-extension/src/index.ts

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* Update extensions/llamacpp-extension/src/index.ts

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* Extend this to available system RAM if GGML device is not available

* fix: Improve model metadata and memory checks

This commit refactors the logic for checking if a model is supported by a system's available memory.

**Key changes:**
- **Remote model support**: The `read_gguf_metadata` function can now fetch metadata from a remote URL by reading the file in chunks.
- **Improved KV cache size calculation**: The KV cache size is now estimated more accurately by using `attention.key_length` and `attention.value_length` from the GGUF metadata, with a fallback to `embedding_length`.
- **Granular memory check statuses**: The `isModelSupported` function now returns a more specific status (`'RED'`, `'YELLOW'`, `'GREEN'`) to indicate whether the model weights or the KV cache are too large for the available memory.
- **Consolidated logic**: The logic for checking local and remote models has been consolidated into a single `isModelSupported` function, improving code clarity and maintainability.

These changes provide more robust and informative model compatibility checks, especially for models hosted on remote servers.

* Update extensions/llamacpp-extension/src/index.ts

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* Make ctx_size optional and use sum free memory across ggml devices

* feat: hub and dropdown model selection handle model compatibility

* feat: update bage model info color

* chore: enable detail page to get compatibility model

* chore: update copy

* chore: update shrink indicator UI

---------

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
2025-08-21 16:13:50 +05:30
Akarshan Biswas
9c25480c7b
fix: Update placeholder text and error message (#6263)
This commit improves the clarity of the llama.cpp extension.

- Corrected a placeholder example from `GGML_VK_VISIBLE_DEVICES='0,1'` to `GGML_VK_VISIBLE_DEVICES=0,1` for better accuracy.
- Changed an ambiguous error message from `"Failed to load llama-server: ${error}"` to the more specific `"Failed to load llamacpp backend"`.
2025-08-21 16:01:31 +05:30
Akarshan Biswas
5c3a6fec32
feat: Add support for custom environmental variables to llama.cpp (#6256)
This commit adds a new setting `llamacpp_env` to the llama.cpp extension, allowing users to specify custom environment variables. These variables are passed to the backend process when it starts.

A new function `parseEnvFromString` is introduced to handle the parsing of the semicolon-separated key-value pairs from the user input. The environment variables are then used in the `load` function and when listing available devices. This enables more flexible configuration of the llama.cpp backend, such as specifying visible GPUs for Vulkan.

This change also updates the Tauri command `get_devices` to accept environment variables, ensuring that device discovery respects the user's settings.
2025-08-21 15:50:37 +05:30
Dinh Long Nguyen
32a2ca95b6
feat: gguf file size + hash validation (#5266) (#6259)
* feat: gguf file size + hash validation

* fix tests fe

* update cargo tests

* handle asyn download for both models and mmproj

* move progress tracker to models

* handle file download cancelled

* add cancellation mid hash run
2025-08-21 16:17:58 +07:00
Louis
6b55812739
Merge pull request #6249 from menloresearch/feat/detect-cpu-arch-run-time
feat: detect cpu arch in runtime
2025-08-21 11:51:13 +07:00
Louis
e6587844d0
Merge branch 'dev' into current-date-instruction 2025-08-21 11:41:30 +07:00
Louis
3a36353b02
fix: backend variant selection 2025-08-21 10:54:35 +07:00
Akarshan Biswas
906b87022d
chore: re enable reasoning_content in backend (#6228)
* chore: re enable reasoning_content in backend

* chore: handle reasoning_content

* chore: refactor get reasoning content

* chore: update PR review

---------

Co-authored-by: Faisal Amir <urmauur@gmail.com>
2025-08-20 13:06:21 +05:30
Akarshan Biswas
0fc3dc6841
Fix: Validate GGUF files before loading (#6238)
This commit adds a GGUF validation check for both the main model file and the `mmproj` file (if present) before they are loaded. This prevents the extension from crashing if an invalid GGUF file is provided.

The `GgufMetadata` interface and `loadMetadata` function were removed as the `readGgufMetadata` is now invoked directly. The code has also been refactored to be more readable, with clearer variable names and more descriptive comments.
2025-08-20 10:31:19 +05:30
Dinh Long Nguyen
b0eec07a01
Add contributing section for jan (#6231) (#6232)
* Add contributing section for jan

* Update CONTRIBUTING.md

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

---------

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
2025-08-20 10:18:35 +07:00
Faisal Amir
5481ee9e35
Merge pull request #6134 from menloresearch/feat/attachment-ui
feat: attachment UI
2025-08-20 10:04:32 +07:00
Kamal Fariz Mahyuddin
df27def9cb
Merge branch 'dev' into current-date-instruction 2025-08-19 14:40:08 -07:00