6043 Commits

Author SHA1 Message Date
Faisal Amir
e02be47aae fix: remove log 2025-09-15 21:09:08 +07:00
Faisal Amir
5736d7b110 fix: auto update should not block popup 2025-09-15 20:51:27 +07:00
Faisal Amir
18114c0a15 fix: pathname file install BE 2025-09-15 18:05:11 +07:00
Akarshan Biswas
e80a865def
fix: detect allocation failures as out-of-memory errors (#6459)
The Llama.cpp backend can emit the phrase “failed to allocate” when it runs out of memory.
Adding this check ensures such messages are correctly classified as out‑of‑memory errors,
providing more accurate error handling CPU backends.
2025-09-15 12:35:24 +05:30
Nguyen Ngoc Minh
55edc7129e
Merge pull request #6457 from menloresearch/chore/makefile-rust-targets
chore: makefile rust targets macos
2025-09-15 12:03:17 +07:00
Akarshan Biswas
489c5a3d9c
fix: KVCache size calculation and refactor (#6438)
- Removed the unused `getKVCachePerToken` helper and replaced it with a unified `estimateKVCache` that returns both total size and per‑token size.
- Fixed the KV cache size calculation to account for all layers, correcting previous under‑estimation.
- Added proper clamping of user‑requested context lengths to the model’s maximum.
- Refactored VRAM budgeting: introduced explicit reserves, fixed engine overhead, and separate multipliers for VRAM and system RAM based on memory mode.
- Implemented a more robust planning flow with clear GPU, Hybrid, and CPU pathways, including fallback configurations when resources are insufficient.
- Updated default context length handling and safety buffers to prevent OOM situations.
- Adjusted usable memory percentage to 90 % and refined logging for easier debugging.
2025-09-15 10:16:13 +05:30
Minh141120
1db67ea9a2 chore: simplify macos workflow 2025-09-15 11:24:11 +07:00
Faisal Amir
7a2782e6fd
Merge pull request #6456 from menloresearch/enhancement/copy-mcp-permission
enhancement: copy MCP permission
2025-09-15 11:01:09 +07:00
Faisal Amir
a4483b7eb7
Update web-app/src/locales/en/tool-approval.json
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
2025-09-15 10:38:36 +07:00
Faisal Amir
a26445e557 chore: make action mutton capitalize 2025-09-15 10:34:27 +07:00
Faisal Amir
44893bc3c3 enhancement: copy MCP permission 2025-09-15 10:33:05 +07:00
Minh141120
4fa78fa892 fix: make install-rust-targets a dependency 2025-09-15 09:44:21 +07:00
Minh141120
6959329fd6 chore: add install-rust-targets step for macOS universal builds 2025-09-15 09:32:51 +07:00
Akarshan Biswas
654e566dcb
fix: correct context shift flag handling in LlamaCPP extension (#6404) (#6431)
* fix: correct context shift flag handling in LlamaCPP extension

The previous implementation added the `--no-context-shift` flag when `cfg.ctx_shift` was disabled, which conflicted with the llama.cpp CLI where the presence of `--context-shift` enables the feature.
The logic is updated to push `--context-shift` only when `cfg.ctx_shift` is true, ensuring the extension passes the correct argument and behaves as expected.

* feat: detect model out of context during generation

---------

Co-authored-by: Dinh Long Nguyen <dinhlongviolin1@gmail.com>
2025-09-12 13:43:31 +05:30
Faisal Amir
ad428f587b
Merge pull request #6426 from menloresearch/fix/error-validate-nested-dom
fix: avoid error validate nested DOM
2025-09-12 12:28:46 +07:00
Faisal Amir
4293fe7edc fix: avoid error validate nested dom 2025-09-12 10:58:34 +07:00
Dinh Long Nguyen
ea72c1ae0f
exclude jan extension web from desktop build (#6419) 2025-09-11 19:51:49 +07:00
Dinh Long Nguyen
db52057030
fix ollama error (#6418) 2025-09-11 18:38:06 +07:00
Faisal Amir
e709d200aa
Merge pull request #6416 from menloresearch/enhancement/experimental-label
enhancement: add label experimental for optimize setting
2025-09-11 16:12:35 +07:00
Dinh Long Nguyen
4856cfbfc4
bug: Deleted model file from imported models blocking model loading (#6317) (#6417) 2025-09-11 15:56:19 +07:00
Faisal Amir
19aa15ffcd chore: update return value 2025-09-11 15:51:21 +07:00
Akarshan
7c41408a1a
feat: add relative path support for model loading
Implemented `isAbsolutePath` helper to correctly identify POSIX, Windows drive‑letter, and UNC absolute paths. Updated `planModelLoad` to automatically resolve relative model and mmproj paths against the Jan data folder, enhancing usability for users supplying non‑absolute paths. Also refined minor formatting for readability.
2025-09-11 13:45:29 +05:30
Akarshan
8f67f29317
feat: add support for mmproj offload setting
Expose the new `mmproj_offload` option in the model settings UI and include it in the `ModelPlan` type. The component now collects the offload flag (`result.offloadMmproj`) and queues it with other setting updates to ensure a single atomic change, preventing race conditions when toggling this feature. This enables users to control MMProj offloading directly from the app.
2025-09-11 13:08:01 +05:30
Faisal Amir
14c7fc0450 chore: update argument 2025-09-11 14:23:56 +07:00
Akarshan
abd0cbe599
refactor: rename noOffloadMmproj flag to offloadMmproj and reorder args
The flag `noOffloadMmproj` was misleading – it actually indicates when the mmproj file **is** offloaded to VRAM. Renaming it to `offloadMmproj` clarifies its purpose and aligns the naming with the surrounding code.

Additionally, the `planModelLoad` signature has been reordered to place `mmprojPath` before `requestedCtx`, improving readability and making the optional parameters more intuitive. All related logic, calculations, and log messages have been updated to use the new flag name.
2025-09-11 12:29:53 +05:30
Faisal Amir
198955285e
Merge pull request #6412 from menloresearch/fix/render-new-line
fix: render new line for user message
2025-09-11 13:29:18 +07:00
Faisal Amir
bc29046c06 enhancement: send params mmptoj_path for optimize setting 2025-09-11 13:23:25 +07:00
Louis
7fea6e1ab0
fix: clean up unused packages (#6414) 2025-09-11 13:16:26 +07:00
Faisal Amir
791563e6ba enhancement: add label experimental for optimize setting 2025-09-11 13:11:37 +07:00
Akarshan Biswas
5ff7935d91
fix: include lm_head and embedding layers in totalLayers count (#6415)
The original calculation used only the `block_count` from the model metadata, which excludes the final LM head and the embedding layer. This caused an underestimation of the total number of layers and consequently an incorrect `layerSize` value. Adding `+2` accounts for these two missing layers, ensuring accurate model size metrics.
2025-09-11 11:40:39 +05:30
Nguyen Ngoc Minh
d856651380
Merge pull request #6413 from menloresearch/ci/add-nightly-external-contrib
ci: add nightly build for external contributors
2025-09-11 13:03:30 +07:00
Minh141120
65a515a9db chore: add upload artifact steps for 3 platforms 2025-09-11 12:21:56 +07:00
Minh141120
773b252555 ci: add nightly build for external contributors 2025-09-11 11:30:43 +07:00
Akarshan Biswas
7a94e74d6b
Merge pull request #6360 from menloresearch/feat/llamacpp_backend
feat: enhance llamacpp backend management and installation
2025-09-11 09:57:16 +05:30
Akarshan
13806a3f06
Fixup sorting in determineBestBackend 2025-09-11 09:56:46 +05:30
Akarshan Biswas
3cd099ee87
Update extensions/llamacpp-extension/src/index.ts
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
2025-09-11 09:55:57 +05:30
Akarshan
42411b5f33
feat: prioritize Vulkan backend only when GPU has ≥6 GB VRAM
Added a GPU memory check using `getSystemInfo` to ensure Vulkan is selected only on systems with at least 6 GB of VRAM.
* Made `determineBestBackend` asynchronous and updated all callers to `await` it.
* Adjusted backend priority list to include or demote Vulkan based on the memory check.
* Updated Vulkan support detection in `backend.ts` to rely solely on API version (memory check moved to selection logic).
* Imported `getSystemInfo` and refined file‑existence validation.

These changes prevent sub‑optimal Vulkan usage on low‑memory GPUs and improve backend selection reliability.
2025-09-11 09:55:55 +05:30
Akarshan
84874c6039
fix file condition 2025-09-11 09:55:08 +05:30
Akarshan
0eff1bfaa9
Throw error when invalid file 2025-09-11 09:55:08 +05:30
Akarshan
5ef9d8dfc3
Add debug logs and refactor 2025-09-11 09:55:06 +05:30
dinhlongviolin1
e2e572ccab
refactor: moved get_short_path to utils and use it in decompress 2025-09-11 09:52:10 +05:30
Faisal Amir
6067ffe107
chore: fix conflict 2025-09-11 09:52:09 +05:30
Faisal Amir
cbd2651a63
chore: update copy and refresh list when import from local machine 2025-09-11 09:52:09 +05:30
Akarshan
2e350ab607
Refresh list of backends by calling configureBackends() and some refactoring in installBackend 2025-09-11 09:52:09 +05:30
Faisal Amir
ba4dc6d1eb
enhancement: update ui dialog update llamacpp backend 2025-09-11 09:52:09 +05:30
Akarshan
a6e4f28830
Add guard before checking locally installed backends 2025-09-11 09:52:09 +05:30
Akarshan
4e37c361c4
feat: expose new updateBackend function for manually updating backend 2025-09-11 09:52:09 +05:30
Akarshan
7ac927ff02
feat: enhance llamacpp backend management and installation
- Add `src-tauri/resources/` to `.gitignore`.
- Introduced utilities to read locally installed backends (`getLocalInstalledBackends`) and fetch remote supported backends (`fetchRemoteSupportedBackends`).
- Refactored `listSupportedBackends` to merge remote and local entries with deduplication and proper sorting.
- Exported `getBackendDir` and integrated it into the extension.
- Added helper `parseBackendVersion` and new method `checkBackendForUpdates` to detect newer backend versions.
- Implemented `installBackend` for manual backend archive installation, including platform‑specific binary path handling.
- Updated command‑line argument logic for `--flash-attn` to respect version‑specific defaults.
- Modified Tauri filesystem `decompress` command to remove overly strict path validation.
2025-09-11 09:52:09 +05:30
Akarshan Biswas
7a174e621a
feat: Smart model management (#6390)
* feat: Smart model management

* **New UI option** – `memory_util` added to `settings.json` with a dropdown (high / medium / low) to let users control how aggressively the engine uses system memory.
* **Configuration updates** – `LlamacppConfig` now includes `memory_util`; the extension class stores it in a new `memoryMode` property and handles updates through `updateConfig`.
* **System memory handling**
  * Introduced `SystemMemory` interface and `getTotalSystemMemory()` to report combined VRAM + RAM.
  * Added helper methods `getKVCachePerToken`, `getLayerSize`, and a new `ModelPlan` type.
* **Smart model‑load planner** – `planModelLoad()` computes:
  * Number of GPU layers that can fit in usable VRAM.
  * Maximum context length based on KV‑cache size and the selected memory utilization mode (high/medium/low).
  * Whether KV‑cache must be off‑loaded to CPU and the overall loading mode (GPU, Hybrid, CPU, Unsupported).
  * Detailed logging of the planning decision.
* **Improved support check** – `isModelSupported()` now:
  * Uses the combined VRAM/RAM totals from `getTotalSystemMemory()`.
  * Applies an 80% usable‑memory heuristic.
  * Returns **GREEN** only when both weights and KV‑cache fit in VRAM, **YELLOW** when they fit only in total memory or require CPU off‑load, and **RED** when the model cannot fit at all.
* **Cleanup** – Removed unused `GgufMetadata` import; updated imports and type definitions accordingly.
* **Documentation/comments** – Added explanatory JSDoc comments for the new methods and clarified the return semantics of `isModelSupported`.

* chore: migrate no_kv_offload from llamacpp setting to model setting

* chore: add UI auto optimize model setting

* feat: improve model loading planner with mmproj support and smarter memory budgeting

* Extend `ModelPlan` with optional `noOffloadMmproj` flag to indicate when a multimodal projector can stay in VRAM.
* Add `mmprojPath` parameter to `planModelLoad` and calculate its size, attempting to keep it on GPU when possible.
* Refactor system memory detection:
  * Use `used_memory` (actual free RAM) instead of total RAM for budgeting.
  * Introduced `usableRAM` placeholder for future use.
* Rewrite KV‑cache size calculation:
  * Properly handle GQA models via `attention.head_count_kv`.
  * Compute bytes per token as `nHeadKV * headDim * 2 * 2 * nLayer`.
* Replace the old 70 % VRAM heuristic with a more flexible budget:
  * Reserve a fixed VRAM amount and apply an overhead factor.
  * Derive usable system RAM from total memory minus VRAM.
* Implement a robust allocation algorithm:
  * Prioritize placing the mmproj in VRAM.
  * Search for the best balance of GPU layers and context length.
  * Fallback strategies for hybrid and pure‑CPU modes with detailed safety checks.
* Add extensive validation of model size, KV‑cache size, layer size, and memory mode.
* Improve logging throughout the planning process for easier debugging.
* Adjust final plan return shape to include the new `noOffloadMmproj` field.

* remove unused variable

---------

Co-authored-by: Faisal Amir <urmauur@gmail.com>
2025-09-11 09:48:03 +05:30
Faisal Amir
9e592b2aca fix: render new line for user message 2025-09-11 10:29:34 +07:00