6 Commits

Author SHA1 Message Date
Vanalite
fa61163350 fix: Fix openssl issue on mobile after merging 2025-10-05 14:40:39 +07:00
Akarshan Biswas
11b3a60675
fix: refactor, fix and move gguf support utilities to backend (#6584)
* feat: move estimateKVCacheSize to BE

* feat: Migrate model planning to backend

This commit migrates the model load planning logic from the frontend to the Tauri backend. This refactors the `planModelLoad` and `isModelSupported` methods into the `tauri-plugin-llamacpp` plugin, making them directly callable from the Rust core.

The model planning now incorporates a more robust and accurate memory estimation, considering both VRAM and system RAM, and introduces a `batch_size` parameter to the model plan.

**Key changes:**

- **Moved `planModelLoad` to `tauri-plugin-llamacpp`:** The core logic for determining GPU layers, context length, and memory offloading is now in Rust for better performance and accuracy.
- **Moved `isModelSupported` to `tauri-plugin-llamacpp`:** The model support check is also now handled by the backend.
- **Removed `getChatClient` from `AIEngine`:** This optional method was not implemented and has been removed from the abstract class.
- **Improved KV Cache estimation:** The `estimate_kv_cache_internal` function in Rust now accounts for `attention.key_length` and `attention.value_length` if available, and considers sliding window attention for more precise estimates.
- **Introduced `batch_size` in ModelPlan:** The model plan now includes a `batch_size` property, which will be automatically adjusted based on the determined `ModelMode` (e.g., lower for CPU/Hybrid modes).
- **Updated `llamacpp-extension`:** The frontend extension now calls the new Tauri commands for model planning and support checks.
- **Removed `batch_size` from `llamacpp-extension/settings.json`:** The batch size is now dynamically determined by the planning logic and will be set as a model setting directly.
- **Updated `ModelSetting` and `useModelProvider` hooks:** These now handle the new `batch_size` property in model settings.
- **Added new Tauri commands and permissions:** `get_model_size`, `is_model_supported`, and `plan_model_load` are new commands with corresponding permissions.
- **Consolidated `ModelSupportStatus` and `KVCacheEstimate`:** These types are now defined in `src/tauri/plugins/tauri-plugin-llamacpp/src/gguf/types.rs`.

This refactoring centralizes critical model resource management logic, improving consistency and maintainability, and lays the groundwork for more sophisticated model loading strategies.

* feat: refine model planner to handle more memory scenarios

This commit introduces several improvements to the `plan_model_load` function, enhancing its ability to determine a suitable model loading strategy based on system memory constraints. Specifically, it includes:

-   **VRAM calculation improvements:**  Corrects the calculation of total VRAM by iterating over GPUs and multiplying by 1024*1024, improving accuracy.
-   **Hybrid plan optimization:**  Implements a more robust hybrid plan strategy, iterating through GPU layer configurations to find the highest possible GPU usage while remaining within VRAM limits.
-   **Minimum context length enforcement:** Enforces a minimum context length for the model, ensuring that the model can be loaded and used effectively.
-   **Fallback to CPU mode:** If a hybrid plan isn't feasible, it now correctly falls back to a CPU-only mode.
-   **Improved logging:** Enhanced logging to provide more detailed information about the memory planning process, including VRAM, RAM, and GPU layers.
-   **Batch size adjustment:** Updated batch size based on the selected mode, ensuring efficient utilization of available resources.
-   **Error handling and edge cases:**  Improved error handling and edge case management to prevent unexpected failures.
-   **Constants:** Added constants for easier maintenance and understanding.
-   **Power-of-2 adjustment:** Added power of 2 adjustment for max context length to ensure correct sizing for the LLM.

These changes improve the reliability and robustness of the model planning process, allowing it to handle a wider range of hardware configurations and model sizes.

* Add log for raw GPU info from tauri-plugin-hardware

* chore: update linux runner for tauri build

* feat: Improve GPU memory calculation for unified memory

This commit improves the logic for calculating usable VRAM, particularly for systems with **unified memory** like Apple Silicon. Previously, the application would report 0 total VRAM if no dedicated GPUs were found, leading to incorrect calculations and failed model loads.

This change modifies the VRAM calculation to fall back to the total system RAM if no discrete GPUs are detected. This is a common and correct approach for unified memory architectures, where the CPU and GPU share the same memory pool.

Additionally, this commit refactors the logic for calculating usable VRAM and RAM to prevent potential underflow by checking if the total memory is greater than the reserved bytes before subtracting. This ensures the calculation remains safe and correct.

* chore: fix update migration version

* fix: enable unified memory support on model support indicator

* Use total_system_memory in bytes

---------

Co-authored-by: Minh141120 <minh.itptit@gmail.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
2025-09-25 12:17:57 +05:30
dinhlongviolin1
e2e572ccab
refactor: moved get_short_path to utils and use it in decompress 2025-09-11 09:52:10 +05:30
Dinh Long Nguyen
5cd81bc6e8
feat: improve testing (#6395)
* add more test rust test

* fix servicehub test

* fix tauri failing on windows
2025-09-09 12:16:25 +07:00
Akarshan Biswas
510c70bdf7
feat: Add model compatibility check and memory estimation (#6243)
* feat: Add model compatibility check and memory estimation

This commit introduces a new feature to check if a given model is supported based on available device memory.

The change includes:
- A new `estimateKVCache` method that calculates the required memory for the model's KV cache. It uses GGUF metadata such as `block_count`, `head_count`, `key_length`, and `value_length` to perform the calculation.
- An `isModelSupported` method that combines the model file size and the estimated KV cache size to determine the total memory required. It then checks if any available device has sufficient free memory to load the model.
- An updated error message for the `version_backend` check to be more user-friendly, suggesting a stable internet connection as a potential solution for backend setup failures.

This functionality helps prevent the application from attempting to load models that would exceed the device's memory capacity, leading to more stable and predictable behavior.

fixes: #5505

* Update extensions/llamacpp-extension/src/index.ts

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* Update extensions/llamacpp-extension/src/index.ts

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* Extend this to available system RAM if GGML device is not available

* fix: Improve model metadata and memory checks

This commit refactors the logic for checking if a model is supported by a system's available memory.

**Key changes:**
- **Remote model support**: The `read_gguf_metadata` function can now fetch metadata from a remote URL by reading the file in chunks.
- **Improved KV cache size calculation**: The KV cache size is now estimated more accurately by using `attention.key_length` and `attention.value_length` from the GGUF metadata, with a fallback to `embedding_length`.
- **Granular memory check statuses**: The `isModelSupported` function now returns a more specific status (`'RED'`, `'YELLOW'`, `'GREEN'`) to indicate whether the model weights or the KV cache are too large for the available memory.
- **Consolidated logic**: The logic for checking local and remote models has been consolidated into a single `isModelSupported` function, improving code clarity and maintainability.

These changes provide more robust and informative model compatibility checks, especially for models hosted on remote servers.

* Update extensions/llamacpp-extension/src/index.ts

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* Make ctx_size optional and use sum free memory across ggml devices

* feat: hub and dropdown model selection handle model compatibility

* feat: update bage model info color

* chore: enable detail page to get compatibility model

* chore: update copy

* chore: update shrink indicator UI

---------

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
2025-08-21 16:13:50 +05:30
Dinh Long Nguyen
e1c8d98bf2
Backend Architecture Refactoring (#6094) (#6162)
* add llamacpp plugin

* Refactor llamacpp plugin

* add utils plugin

* remove utils folder

* add hardware implementation

* add utils folder + move utils function

* organize cargo files

* refactor utils src

* refactor util

* apply fmt

* fmt

* Update gguf + reformat

* add permission for gguf commands

* fix cargo test windows

* revert yarn lock

* remove cargo.lock for hardware plugin

* ignore cargo.lock file

* Fix hardware invoke + refactor hardware + refactor tests, constants

* use api wrapper in extension to invoke hardware call + api wrapper build integration

* add newline at EOF (per Akarshan)

* add vi mock for getSystemInfo
2025-08-15 08:59:01 +07:00