Akarshan Biswas 11b3a60675
fix: refactor, fix and move gguf support utilities to backend (#6584)
* feat: move estimateKVCacheSize to BE

* feat: Migrate model planning to backend

This commit migrates the model load planning logic from the frontend to the Tauri backend. This refactors the `planModelLoad` and `isModelSupported` methods into the `tauri-plugin-llamacpp` plugin, making them directly callable from the Rust core.

The model planning now incorporates a more robust and accurate memory estimation, considering both VRAM and system RAM, and introduces a `batch_size` parameter to the model plan.

**Key changes:**

- **Moved `planModelLoad` to `tauri-plugin-llamacpp`:** The core logic for determining GPU layers, context length, and memory offloading is now in Rust for better performance and accuracy.
- **Moved `isModelSupported` to `tauri-plugin-llamacpp`:** The model support check is also now handled by the backend.
- **Removed `getChatClient` from `AIEngine`:** This optional method was not implemented and has been removed from the abstract class.
- **Improved KV Cache estimation:** The `estimate_kv_cache_internal` function in Rust now accounts for `attention.key_length` and `attention.value_length` if available, and considers sliding window attention for more precise estimates.
- **Introduced `batch_size` in ModelPlan:** The model plan now includes a `batch_size` property, which will be automatically adjusted based on the determined `ModelMode` (e.g., lower for CPU/Hybrid modes).
- **Updated `llamacpp-extension`:** The frontend extension now calls the new Tauri commands for model planning and support checks.
- **Removed `batch_size` from `llamacpp-extension/settings.json`:** The batch size is now dynamically determined by the planning logic and will be set as a model setting directly.
- **Updated `ModelSetting` and `useModelProvider` hooks:** These now handle the new `batch_size` property in model settings.
- **Added new Tauri commands and permissions:** `get_model_size`, `is_model_supported`, and `plan_model_load` are new commands with corresponding permissions.
- **Consolidated `ModelSupportStatus` and `KVCacheEstimate`:** These types are now defined in `src/tauri/plugins/tauri-plugin-llamacpp/src/gguf/types.rs`.

This refactoring centralizes critical model resource management logic, improving consistency and maintainability, and lays the groundwork for more sophisticated model loading strategies.

* feat: refine model planner to handle more memory scenarios

This commit introduces several improvements to the `plan_model_load` function, enhancing its ability to determine a suitable model loading strategy based on system memory constraints. Specifically, it includes:

-   **VRAM calculation improvements:**  Corrects the calculation of total VRAM by iterating over GPUs and multiplying by 1024*1024, improving accuracy.
-   **Hybrid plan optimization:**  Implements a more robust hybrid plan strategy, iterating through GPU layer configurations to find the highest possible GPU usage while remaining within VRAM limits.
-   **Minimum context length enforcement:** Enforces a minimum context length for the model, ensuring that the model can be loaded and used effectively.
-   **Fallback to CPU mode:** If a hybrid plan isn't feasible, it now correctly falls back to a CPU-only mode.
-   **Improved logging:** Enhanced logging to provide more detailed information about the memory planning process, including VRAM, RAM, and GPU layers.
-   **Batch size adjustment:** Updated batch size based on the selected mode, ensuring efficient utilization of available resources.
-   **Error handling and edge cases:**  Improved error handling and edge case management to prevent unexpected failures.
-   **Constants:** Added constants for easier maintenance and understanding.
-   **Power-of-2 adjustment:** Added power of 2 adjustment for max context length to ensure correct sizing for the LLM.

These changes improve the reliability and robustness of the model planning process, allowing it to handle a wider range of hardware configurations and model sizes.

* Add log for raw GPU info from tauri-plugin-hardware

* chore: update linux runner for tauri build

* feat: Improve GPU memory calculation for unified memory

This commit improves the logic for calculating usable VRAM, particularly for systems with **unified memory** like Apple Silicon. Previously, the application would report 0 total VRAM if no dedicated GPUs were found, leading to incorrect calculations and failed model loads.

This change modifies the VRAM calculation to fall back to the total system RAM if no discrete GPUs are detected. This is a common and correct approach for unified memory architectures, where the CPU and GPU share the same memory pool.

Additionally, this commit refactors the logic for calculating usable VRAM and RAM to prevent potential underflow by checking if the total memory is greater than the reserved bytes before subtracting. This ensures the calculation remains safe and correct.

* chore: fix update migration version

* fix: enable unified memory support on model support indicator

* Use total_system_memory in bytes

---------

Co-authored-by: Minh141120 <minh.itptit@gmail.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
2025-09-25 12:17:57 +05:30

498 lines
19 KiB
JSON

{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "PermissionFile",
"description": "Permission file that can define a default permission, a set of permissions or a list of inlined permissions.",
"type": "object",
"properties": {
"default": {
"description": "The default permission set for the plugin",
"anyOf": [
{
"$ref": "#/definitions/DefaultPermission"
},
{
"type": "null"
}
]
},
"set": {
"description": "A list of permissions sets defined",
"type": "array",
"items": {
"$ref": "#/definitions/PermissionSet"
}
},
"permission": {
"description": "A list of inlined permissions",
"default": [],
"type": "array",
"items": {
"$ref": "#/definitions/Permission"
}
}
},
"definitions": {
"DefaultPermission": {
"description": "The default permission set of the plugin.\n\nWorks similarly to a permission with the \"default\" identifier.",
"type": "object",
"required": [
"permissions"
],
"properties": {
"version": {
"description": "The version of the permission.",
"type": [
"integer",
"null"
],
"format": "uint64",
"minimum": 1.0
},
"description": {
"description": "Human-readable description of what the permission does. Tauri convention is to use `<h4>` headings in markdown content for Tauri documentation generation purposes.",
"type": [
"string",
"null"
]
},
"permissions": {
"description": "All permissions this set contains.",
"type": "array",
"items": {
"type": "string"
}
}
}
},
"PermissionSet": {
"description": "A set of direct permissions grouped together under a new name.",
"type": "object",
"required": [
"description",
"identifier",
"permissions"
],
"properties": {
"identifier": {
"description": "A unique identifier for the permission.",
"type": "string"
},
"description": {
"description": "Human-readable description of what the permission does.",
"type": "string"
},
"permissions": {
"description": "All permissions this set contains.",
"type": "array",
"items": {
"$ref": "#/definitions/PermissionKind"
}
}
}
},
"Permission": {
"description": "Descriptions of explicit privileges of commands.\n\nIt can enable commands to be accessible in the frontend of the application.\n\nIf the scope is defined it can be used to fine grain control the access of individual or multiple commands.",
"type": "object",
"required": [
"identifier"
],
"properties": {
"version": {
"description": "The version of the permission.",
"type": [
"integer",
"null"
],
"format": "uint64",
"minimum": 1.0
},
"identifier": {
"description": "A unique identifier for the permission.",
"type": "string"
},
"description": {
"description": "Human-readable description of what the permission does. Tauri internal convention is to use `<h4>` headings in markdown content for Tauri documentation generation purposes.",
"type": [
"string",
"null"
]
},
"commands": {
"description": "Allowed or denied commands when using this permission.",
"default": {
"allow": [],
"deny": []
},
"allOf": [
{
"$ref": "#/definitions/Commands"
}
]
},
"scope": {
"description": "Allowed or denied scoped when using this permission.",
"allOf": [
{
"$ref": "#/definitions/Scopes"
}
]
},
"platforms": {
"description": "Target platforms this permission applies. By default all platforms are affected by this permission.",
"type": [
"array",
"null"
],
"items": {
"$ref": "#/definitions/Target"
}
}
}
},
"Commands": {
"description": "Allowed and denied commands inside a permission.\n\nIf two commands clash inside of `allow` and `deny`, it should be denied by default.",
"type": "object",
"properties": {
"allow": {
"description": "Allowed command.",
"default": [],
"type": "array",
"items": {
"type": "string"
}
},
"deny": {
"description": "Denied command, which takes priority.",
"default": [],
"type": "array",
"items": {
"type": "string"
}
}
}
},
"Scopes": {
"description": "An argument for fine grained behavior control of Tauri commands.\n\nIt can be of any serde serializable type and is used to allow or prevent certain actions inside a Tauri command. The configured scope is passed to the command and will be enforced by the command implementation.\n\n## Example\n\n```json { \"allow\": [{ \"path\": \"$HOME/**\" }], \"deny\": [{ \"path\": \"$HOME/secret.txt\" }] } ```",
"type": "object",
"properties": {
"allow": {
"description": "Data that defines what is allowed by the scope.",
"type": [
"array",
"null"
],
"items": {
"$ref": "#/definitions/Value"
}
},
"deny": {
"description": "Data that defines what is denied by the scope. This should be prioritized by validation logic.",
"type": [
"array",
"null"
],
"items": {
"$ref": "#/definitions/Value"
}
}
}
},
"Value": {
"description": "All supported ACL values.",
"anyOf": [
{
"description": "Represents a null JSON value.",
"type": "null"
},
{
"description": "Represents a [`bool`].",
"type": "boolean"
},
{
"description": "Represents a valid ACL [`Number`].",
"allOf": [
{
"$ref": "#/definitions/Number"
}
]
},
{
"description": "Represents a [`String`].",
"type": "string"
},
{
"description": "Represents a list of other [`Value`]s.",
"type": "array",
"items": {
"$ref": "#/definitions/Value"
}
},
{
"description": "Represents a map of [`String`] keys to [`Value`]s.",
"type": "object",
"additionalProperties": {
"$ref": "#/definitions/Value"
}
}
]
},
"Number": {
"description": "A valid ACL number.",
"anyOf": [
{
"description": "Represents an [`i64`].",
"type": "integer",
"format": "int64"
},
{
"description": "Represents a [`f64`].",
"type": "number",
"format": "double"
}
]
},
"Target": {
"description": "Platform target.",
"oneOf": [
{
"description": "MacOS.",
"type": "string",
"enum": [
"macOS"
]
},
{
"description": "Windows.",
"type": "string",
"enum": [
"windows"
]
},
{
"description": "Linux.",
"type": "string",
"enum": [
"linux"
]
},
{
"description": "Android.",
"type": "string",
"enum": [
"android"
]
},
{
"description": "iOS.",
"type": "string",
"enum": [
"iOS"
]
}
]
},
"PermissionKind": {
"type": "string",
"oneOf": [
{
"description": "Enables the cleanup_llama_processes command without any pre-configured scope.",
"type": "string",
"const": "allow-cleanup-llama-processes",
"markdownDescription": "Enables the cleanup_llama_processes command without any pre-configured scope."
},
{
"description": "Denies the cleanup_llama_processes command without any pre-configured scope.",
"type": "string",
"const": "deny-cleanup-llama-processes",
"markdownDescription": "Denies the cleanup_llama_processes command without any pre-configured scope."
},
{
"description": "Enables the estimate_kv_cache_size command without any pre-configured scope.",
"type": "string",
"const": "allow-estimate-kv-cache-size",
"markdownDescription": "Enables the estimate_kv_cache_size command without any pre-configured scope."
},
{
"description": "Denies the estimate_kv_cache_size command without any pre-configured scope.",
"type": "string",
"const": "deny-estimate-kv-cache-size",
"markdownDescription": "Denies the estimate_kv_cache_size command without any pre-configured scope."
},
{
"description": "Enables the find_session_by_model command without any pre-configured scope.",
"type": "string",
"const": "allow-find-session-by-model",
"markdownDescription": "Enables the find_session_by_model command without any pre-configured scope."
},
{
"description": "Denies the find_session_by_model command without any pre-configured scope.",
"type": "string",
"const": "deny-find-session-by-model",
"markdownDescription": "Denies the find_session_by_model command without any pre-configured scope."
},
{
"description": "Enables the generate_api_key command without any pre-configured scope.",
"type": "string",
"const": "allow-generate-api-key",
"markdownDescription": "Enables the generate_api_key command without any pre-configured scope."
},
{
"description": "Denies the generate_api_key command without any pre-configured scope.",
"type": "string",
"const": "deny-generate-api-key",
"markdownDescription": "Denies the generate_api_key command without any pre-configured scope."
},
{
"description": "Enables the get_all_sessions command without any pre-configured scope.",
"type": "string",
"const": "allow-get-all-sessions",
"markdownDescription": "Enables the get_all_sessions command without any pre-configured scope."
},
{
"description": "Denies the get_all_sessions command without any pre-configured scope.",
"type": "string",
"const": "deny-get-all-sessions",
"markdownDescription": "Denies the get_all_sessions command without any pre-configured scope."
},
{
"description": "Enables the get_devices command without any pre-configured scope.",
"type": "string",
"const": "allow-get-devices",
"markdownDescription": "Enables the get_devices command without any pre-configured scope."
},
{
"description": "Denies the get_devices command without any pre-configured scope.",
"type": "string",
"const": "deny-get-devices",
"markdownDescription": "Denies the get_devices command without any pre-configured scope."
},
{
"description": "Enables the get_loaded_models command without any pre-configured scope.",
"type": "string",
"const": "allow-get-loaded-models",
"markdownDescription": "Enables the get_loaded_models command without any pre-configured scope."
},
{
"description": "Denies the get_loaded_models command without any pre-configured scope.",
"type": "string",
"const": "deny-get-loaded-models",
"markdownDescription": "Denies the get_loaded_models command without any pre-configured scope."
},
{
"description": "Enables the get_model_size command without any pre-configured scope.",
"type": "string",
"const": "allow-get-model-size",
"markdownDescription": "Enables the get_model_size command without any pre-configured scope."
},
{
"description": "Denies the get_model_size command without any pre-configured scope.",
"type": "string",
"const": "deny-get-model-size",
"markdownDescription": "Denies the get_model_size command without any pre-configured scope."
},
{
"description": "Enables the get_random_port command without any pre-configured scope.",
"type": "string",
"const": "allow-get-random-port",
"markdownDescription": "Enables the get_random_port command without any pre-configured scope."
},
{
"description": "Denies the get_random_port command without any pre-configured scope.",
"type": "string",
"const": "deny-get-random-port",
"markdownDescription": "Denies the get_random_port command without any pre-configured scope."
},
{
"description": "Enables the get_session_by_model command without any pre-configured scope.",
"type": "string",
"const": "allow-get-session-by-model",
"markdownDescription": "Enables the get_session_by_model command without any pre-configured scope."
},
{
"description": "Denies the get_session_by_model command without any pre-configured scope.",
"type": "string",
"const": "deny-get-session-by-model",
"markdownDescription": "Denies the get_session_by_model command without any pre-configured scope."
},
{
"description": "Enables the is_model_supported command without any pre-configured scope.",
"type": "string",
"const": "allow-is-model-supported",
"markdownDescription": "Enables the is_model_supported command without any pre-configured scope."
},
{
"description": "Denies the is_model_supported command without any pre-configured scope.",
"type": "string",
"const": "deny-is-model-supported",
"markdownDescription": "Denies the is_model_supported command without any pre-configured scope."
},
{
"description": "Enables the is_process_running command without any pre-configured scope.",
"type": "string",
"const": "allow-is-process-running",
"markdownDescription": "Enables the is_process_running command without any pre-configured scope."
},
{
"description": "Denies the is_process_running command without any pre-configured scope.",
"type": "string",
"const": "deny-is-process-running",
"markdownDescription": "Denies the is_process_running command without any pre-configured scope."
},
{
"description": "Enables the load_llama_model command without any pre-configured scope.",
"type": "string",
"const": "allow-load-llama-model",
"markdownDescription": "Enables the load_llama_model command without any pre-configured scope."
},
{
"description": "Denies the load_llama_model command without any pre-configured scope.",
"type": "string",
"const": "deny-load-llama-model",
"markdownDescription": "Denies the load_llama_model command without any pre-configured scope."
},
{
"description": "Enables the plan_model_load command without any pre-configured scope.",
"type": "string",
"const": "allow-plan-model-load",
"markdownDescription": "Enables the plan_model_load command without any pre-configured scope."
},
{
"description": "Denies the plan_model_load command without any pre-configured scope.",
"type": "string",
"const": "deny-plan-model-load",
"markdownDescription": "Denies the plan_model_load command without any pre-configured scope."
},
{
"description": "Enables the read_gguf_metadata command without any pre-configured scope.",
"type": "string",
"const": "allow-read-gguf-metadata",
"markdownDescription": "Enables the read_gguf_metadata command without any pre-configured scope."
},
{
"description": "Denies the read_gguf_metadata command without any pre-configured scope.",
"type": "string",
"const": "deny-read-gguf-metadata",
"markdownDescription": "Denies the read_gguf_metadata command without any pre-configured scope."
},
{
"description": "Enables the unload_llama_model command without any pre-configured scope.",
"type": "string",
"const": "allow-unload-llama-model",
"markdownDescription": "Enables the unload_llama_model command without any pre-configured scope."
},
{
"description": "Denies the unload_llama_model command without any pre-configured scope.",
"type": "string",
"const": "deny-unload-llama-model",
"markdownDescription": "Denies the unload_llama_model command without any pre-configured scope."
},
{
"description": "Default permissions for the llamacpp plugin\n#### This default permission set includes:\n\n- `allow-cleanup-llama-processes`\n- `allow-load-llama-model`\n- `allow-unload-llama-model`\n- `allow-get-devices`\n- `allow-generate-api-key`\n- `allow-is-process-running`\n- `allow-get-random-port`\n- `allow-find-session-by-model`\n- `allow-get-loaded-models`\n- `allow-get-all-sessions`\n- `allow-get-session-by-model`\n- `allow-read-gguf-metadata`\n- `allow-estimate-kv-cache-size`\n- `allow-get-model-size`\n- `allow-is-model-supported`\n- `allow-plan-model-load`",
"type": "string",
"const": "default",
"markdownDescription": "Default permissions for the llamacpp plugin\n#### This default permission set includes:\n\n- `allow-cleanup-llama-processes`\n- `allow-load-llama-model`\n- `allow-unload-llama-model`\n- `allow-get-devices`\n- `allow-generate-api-key`\n- `allow-is-process-running`\n- `allow-get-random-port`\n- `allow-find-session-by-model`\n- `allow-get-loaded-models`\n- `allow-get-all-sessions`\n- `allow-get-session-by-model`\n- `allow-read-gguf-metadata`\n- `allow-estimate-kv-cache-size`\n- `allow-get-model-size`\n- `allow-is-model-supported`\n- `allow-plan-model-load`"
}
]
}
}
}