Nicholai/jan - jan - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Akarshan Biswas	01050f3103	fix: Gracefully handle offline mode during backend check (#6767 ) The `listSupportedBackends` function now includes error handling for the `fetchRemoteSupportedBackends` call. This addresses an issue where an error thrown during the remote fetch (e.g., due to no network connection in offline mode) would prevent the subsequent loading of locally installed or manually provided llama.cpp backends. The remote backend versions array will now default to empty if the fetch fails, allowing the rest of the backend initialization process to proceed as expected.	2025-10-09 07:21:53 +05:30
Roushan Singh	c091b8cd77	refactor: safely strip prefix and extensions from filename	2025-09-26 15:02:23 +05:30
Akarshan Biswas	11b3a60675	fix: refactor, fix and move gguf support utilities to backend (#6584 ) * feat: move estimateKVCacheSize to BE * feat: Migrate model planning to backend This commit migrates the model load planning logic from the frontend to the Tauri backend. This refactors the `planModelLoad` and `isModelSupported` methods into the `tauri-plugin-llamacpp` plugin, making them directly callable from the Rust core. The model planning now incorporates a more robust and accurate memory estimation, considering both VRAM and system RAM, and introduces a `batch_size` parameter to the model plan. Key changes: - Moved `planModelLoad` to `tauri-plugin-llamacpp`: The core logic for determining GPU layers, context length, and memory offloading is now in Rust for better performance and accuracy. - Moved `isModelSupported` to `tauri-plugin-llamacpp`: The model support check is also now handled by the backend. - Removed `getChatClient` from `AIEngine`: This optional method was not implemented and has been removed from the abstract class. - Improved KV Cache estimation: The `estimate_kv_cache_internal` function in Rust now accounts for `attention.key_length` and `attention.value_length` if available, and considers sliding window attention for more precise estimates. - Introduced `batch_size` in ModelPlan: The model plan now includes a `batch_size` property, which will be automatically adjusted based on the determined `ModelMode` (e.g., lower for CPU/Hybrid modes). - Updated `llamacpp-extension`: The frontend extension now calls the new Tauri commands for model planning and support checks. - Removed `batch_size` from `llamacpp-extension/settings.json`: The batch size is now dynamically determined by the planning logic and will be set as a model setting directly. - Updated `ModelSetting` and `useModelProvider` hooks: These now handle the new `batch_size` property in model settings. - Added new Tauri commands and permissions: `get_model_size`, `is_model_supported`, and `plan_model_load` are new commands with corresponding permissions. - Consolidated `ModelSupportStatus` and `KVCacheEstimate`: These types are now defined in `src/tauri/plugins/tauri-plugin-llamacpp/src/gguf/types.rs`. This refactoring centralizes critical model resource management logic, improving consistency and maintainability, and lays the groundwork for more sophisticated model loading strategies. * feat: refine model planner to handle more memory scenarios This commit introduces several improvements to the `plan_model_load` function, enhancing its ability to determine a suitable model loading strategy based on system memory constraints. Specifically, it includes: - VRAM calculation improvements: Corrects the calculation of total VRAM by iterating over GPUs and multiplying by 10241024, improving accuracy. - Hybrid plan optimization:* Implements a more robust hybrid plan strategy, iterating through GPU layer configurations to find the highest possible GPU usage while remaining within VRAM limits. - Minimum context length enforcement: Enforces a minimum context length for the model, ensuring that the model can be loaded and used effectively. - Fallback to CPU mode: If a hybrid plan isn't feasible, it now correctly falls back to a CPU-only mode. - Improved logging: Enhanced logging to provide more detailed information about the memory planning process, including VRAM, RAM, and GPU layers. - Batch size adjustment: Updated batch size based on the selected mode, ensuring efficient utilization of available resources. - Error handling and edge cases: Improved error handling and edge case management to prevent unexpected failures. - Constants: Added constants for easier maintenance and understanding. - Power-of-2 adjustment: Added power of 2 adjustment for max context length to ensure correct sizing for the LLM. These changes improve the reliability and robustness of the model planning process, allowing it to handle a wider range of hardware configurations and model sizes. * Add log for raw GPU info from tauri-plugin-hardware * chore: update linux runner for tauri build * feat: Improve GPU memory calculation for unified memory This commit improves the logic for calculating usable VRAM, particularly for systems with unified memory like Apple Silicon. Previously, the application would report 0 total VRAM if no dedicated GPUs were found, leading to incorrect calculations and failed model loads. This change modifies the VRAM calculation to fall back to the total system RAM if no discrete GPUs are detected. This is a common and correct approach for unified memory architectures, where the CPU and GPU share the same memory pool. Additionally, this commit refactors the logic for calculating usable VRAM and RAM to prevent potential underflow by checking if the total memory is greater than the reserved bytes before subtracting. This ensures the calculation remains safe and correct. * chore: fix update migration version * fix: enable unified memory support on model support indicator * Use total_system_memory in bytes --------- Co-authored-by: Minh141120 <minh.itptit@gmail.com> Co-authored-by: Faisal Amir <urmauur@gmail.com>	2025-09-25 12:17:57 +05:30
Louis	7fe58d6bee	fix: allow users cancel backend download (#6582 ) * fix: allow users cancel backend download * fix: should not redownload on cancel	2025-09-25 13:19:14 +07:00
Roushan Kumar Singh	3f51c35229	feat: support .zip archives for manual backend install (#6534 ) * feat(llamacpp): support .zip archives for manual backend install * Update Lock Files	2025-09-23 18:02:06 +05:30
Akarshan	42411b5f33	feat: prioritize Vulkan backend only when GPU has ≥6 GB VRAM Added a GPU memory check using `getSystemInfo` to ensure Vulkan is selected only on systems with at least 6 GB of VRAM. * Made `determineBestBackend` asynchronous and updated all callers to `await` it. * Adjusted backend priority list to include or demote Vulkan based on the memory check. * Updated Vulkan support detection in `backend.ts` to rely solely on API version (memory check moved to selection logic). * Imported `getSystemInfo` and refined file‑existence validation. These changes prevent sub‑optimal Vulkan usage on low‑memory GPUs and improve backend selection reliability.	2025-09-11 09:55:55 +05:30
Akarshan	5ef9d8dfc3	Add debug logs and refactor	2025-09-11 09:55:06 +05:30
Faisal Amir	ba4dc6d1eb	enhancement: update ui dialog update llamacpp backend	2025-09-11 09:52:09 +05:30
Akarshan	a6e4f28830	Add guard before checking locally installed backends	2025-09-11 09:52:09 +05:30
Akarshan	7ac927ff02	feat: enhance llamacpp backend management and installation - Add `src-tauri/resources/` to `.gitignore`. - Introduced utilities to read locally installed backends (`getLocalInstalledBackends`) and fetch remote supported backends (`fetchRemoteSupportedBackends`). - Refactored `listSupportedBackends` to merge remote and local entries with deduplication and proper sorting. - Exported `getBackendDir` and integrated it into the extension. - Added helper `parseBackendVersion` and new method `checkBackendForUpdates` to detect newer backend versions. - Implemented `installBackend` for manual backend archive installation, including platform‑specific binary path handling. - Updated command‑line argument logic for `--flash-attn` to respect version‑specific defaults. - Modified Tauri filesystem `decompress` command to remove overly strict path validation.	2025-09-11 09:52:09 +05:30
hiento09	1b74772d07	feat: download llamacpp backend fail back to cdn (#6361 ) * feat: download llamacpp backend fail back to cdn incase github api encounters errors	2025-09-04 09:39:16 +07:00
Louis	3a36353b02	fix: backend variant selection	2025-08-21 10:54:35 +07:00
Akarshan Biswas	5ad3d282af	fix: re-enable Vulkan backend in integrated GPUs with enough memory (#6215 )	2025-08-18 17:31:01 +05:30
Dinh Long Nguyen	e1c8d98bf2	Backend Architecture Refactoring (#6094 ) (#6162 ) * add llamacpp plugin * Refactor llamacpp plugin * add utils plugin * remove utils folder * add hardware implementation * add utils folder + move utils function * organize cargo files * refactor utils src * refactor util * apply fmt * fmt * Update gguf + reformat * add permission for gguf commands * fix cargo test windows * revert yarn lock * remove cargo.lock for hardware plugin * ignore cargo.lock file * Fix hardware invoke + refactor hardware + refactor tests, constants * use api wrapper in extension to invoke hardware call + api wrapper build integration * add newline at EOF (per Akarshan) * add vi mock for getSystemInfo	2025-08-15 08:59:01 +07:00
Akarshan Biswas	8d147c1774	fix: Add conditional Vulkan support check for better GPU compatibility (#6066 ) Changes: - Introduce conditional Vulkan support check for discrete GPUs with 6GB+ VRAM fixes: #6009	2025-08-06 07:20:44 +05:30
Louis	bf9315dbbe	fix: add missing cuda backend support	2025-08-04 15:54:21 +07:00
Akarshan Biswas	1eaec5e4f6	Fix: engine unable to find dlls on when running on Windows (#5863 ) * Fix: Windows llamacpp not picking up dlls from lib repo * Fix lib path on Windows * Add debug info about lib_path * Normalize lib_path for Windows * fix window lib path normalization * fix: missing cuda dll files on windows * throw backend setup errors to UI * Fix format * Update extensions/llamacpp-extension/src/index.ts Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * feat: add logger to llamacpp-extension * fix: platform check --------- Co-authored-by: Louis <louis@jan.ai> Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>	2025-07-22 20:05:24 +05:30
Louis	19cb1c96e0	fix: llama.cpp backend download on windows (#5813 ) * fix: llama.cpp backend download on windows * test: add missing cases * clean: linter * fix: build	2025-07-20 16:58:09 +07:00
Louis	8ca507c01c	feat: proxy support for the new downloader (#5795 ) * feat: proxy support for the new downloader * test: remove outdated test * ci: clean up	2025-07-17 23:10:21 +07:00
Akarshan	37151ba926	Feat: Auto load and download default backend during first launch	2025-07-03 09:13:32 +05:30
Thien Tran	525cc93d4a	fix system cudart detection on linux	2025-07-02 12:27:34 +07:00
Thien Tran	65d6f34878	check for system libraries	2025-07-02 12:27:17 +07:00
Thien Tran	622f4118c0	add placeholder for windows and linux arm	2025-07-02 12:27:17 +07:00
Thien Tran	f7bcf43334	update folde structure. small refactoring	2025-07-02 12:27:16 +07:00
Thien Tran	494a47aaa5	fix download condition	2025-07-02 12:27:14 +07:00
Thien Tran	f32ae402d5	fix CUDA version URL	2025-07-02 12:27:14 +07:00
Thien Tran	27146eb5cc	fix feature parsing	2025-07-02 12:27:14 +07:00
Thien Tran	a75d13f42f	fix version compare	2025-07-02 12:27:14 +07:00
Thien Tran	3490299f66	refactor get supported features. check driver version for cu11 and cu12	2025-07-02 12:27:13 +07:00
Thien Tran	fbfaaf43c5	download CUDA libs if needed	2025-07-02 12:27:13 +07:00
Thien Tran	40cd7e962a	feat: download backend for llama.cpp extension (#5123 ) * wip * update * add download logic * add decompress. support delete file * download backend upon selecting setting * add some logging and nootes * add note on race condition * remove then catch * default to none backend. only download if it's not installed * merge version and backend. fetch version from GH * restrict scope of output_dir * add note on unpack	2025-07-02 12:27:13 +07:00

31 Commits