Go to file

Akarshan Biswas 11b3a60675

fix: refactor, fix and move gguf support utilities to backend (#6584 )

* feat: move estimateKVCacheSize to BE

* feat: Migrate model planning to backend

This commit migrates the model load planning logic from the frontend to the Tauri backend. This refactors the `planModelLoad` and `isModelSupported` methods into the `tauri-plugin-llamacpp` plugin, making them directly callable from the Rust core.

The model planning now incorporates a more robust and accurate memory estimation, considering both VRAM and system RAM, and introduces a `batch_size` parameter to the model plan.

**Key changes:**

- **Moved `planModelLoad` to `tauri-plugin-llamacpp`:** The core logic for determining GPU layers, context length, and memory offloading is now in Rust for better performance and accuracy.
- **Moved `isModelSupported` to `tauri-plugin-llamacpp`:** The model support check is also now handled by the backend.
- **Removed `getChatClient` from `AIEngine`:** This optional method was not implemented and has been removed from the abstract class.
- **Improved KV Cache estimation:** The `estimate_kv_cache_internal` function in Rust now accounts for `attention.key_length` and `attention.value_length` if available, and considers sliding window attention for more precise estimates.
- **Introduced `batch_size` in ModelPlan:** The model plan now includes a `batch_size` property, which will be automatically adjusted based on the determined `ModelMode` (e.g., lower for CPU/Hybrid modes).
- **Updated `llamacpp-extension`:** The frontend extension now calls the new Tauri commands for model planning and support checks.
- **Removed `batch_size` from `llamacpp-extension/settings.json`:** The batch size is now dynamically determined by the planning logic and will be set as a model setting directly.
- **Updated `ModelSetting` and `useModelProvider` hooks:** These now handle the new `batch_size` property in model settings.
- **Added new Tauri commands and permissions:** `get_model_size`, `is_model_supported`, and `plan_model_load` are new commands with corresponding permissions.
- **Consolidated `ModelSupportStatus` and `KVCacheEstimate`:** These types are now defined in `src/tauri/plugins/tauri-plugin-llamacpp/src/gguf/types.rs`.

This refactoring centralizes critical model resource management logic, improving consistency and maintainability, and lays the groundwork for more sophisticated model loading strategies.

* feat: refine model planner to handle more memory scenarios

This commit introduces several improvements to the `plan_model_load` function, enhancing its ability to determine a suitable model loading strategy based on system memory constraints. Specifically, it includes:

- **VRAM calculation improvements:** Corrects the calculation of total VRAM by iterating over GPUs and multiplying by 1024*1024, improving accuracy.
- **Hybrid plan optimization:** Implements a more robust hybrid plan strategy, iterating through GPU layer configurations to find the highest possible GPU usage while remaining within VRAM limits.
- **Minimum context length enforcement:** Enforces a minimum context length for the model, ensuring that the model can be loaded and used effectively.
- **Fallback to CPU mode:** If a hybrid plan isn't feasible, it now correctly falls back to a CPU-only mode.
- **Improved logging:** Enhanced logging to provide more detailed information about the memory planning process, including VRAM, RAM, and GPU layers.
- **Batch size adjustment:** Updated batch size based on the selected mode, ensuring efficient utilization of available resources.
- **Error handling and edge cases:** Improved error handling and edge case management to prevent unexpected failures.
- **Constants:** Added constants for easier maintenance and understanding.
- **Power-of-2 adjustment:** Added power of 2 adjustment for max context length to ensure correct sizing for the LLM.

These changes improve the reliability and robustness of the model planning process, allowing it to handle a wider range of hardware configurations and model sizes.

* Add log for raw GPU info from tauri-plugin-hardware

* chore: update linux runner for tauri build

* feat: Improve GPU memory calculation for unified memory

This commit improves the logic for calculating usable VRAM, particularly for systems with **unified memory** like Apple Silicon. Previously, the application would report 0 total VRAM if no dedicated GPUs were found, leading to incorrect calculations and failed model loads.

This change modifies the VRAM calculation to fall back to the total system RAM if no discrete GPUs are detected. This is a common and correct approach for unified memory architectures, where the CPU and GPU share the same memory pool.

Additionally, this commit refactors the logic for calculating usable VRAM and RAM to prevent potential underflow by checking if the total memory is greater than the reserved bytes before subtracting. This ensures the calculation remains safe and correct.

* chore: fix update migration version

* fix: enable unified memory support on model support indicator

* Use total_system_memory in bytes

---------

Co-authored-by: Minh141120 <minh.itptit@gmail.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>

2025-09-25 12:17:57 +05:30

.claude/commands

ci: add claude issue dedup

2025-09-10 17:16:21 +07:00

.devcontainer

refactor: pin linuxdeploy in make/yarn build process instead of github workflow

2025-07-10 04:50:12 +00:00

.github

fix: refactor, fix and move gguf support utilities to backend (#6584 )

2025-09-25 12:17:57 +05:30

.husky

chore: enhance onboarding screen's models (#4723 )

2025-02-25 09:36:55 +07:00

autoqa

feat: add regression checklist

2025-08-27 17:12:49 +07:00

core

fix: refactor, fix and move gguf support utilities to backend (#6584 )

2025-09-25 12:17:57 +05:30

docs

enhancement: update responsive footer and copy hero section

2025-09-25 11:26:15 +07:00

extensions

fix: refactor, fix and move gguf support utilities to backend (#6584 )

2025-09-25 12:17:57 +05:30

extensions-web

fix: allow users to download the same model from different authors (#6577 )

2025-09-24 17:57:10 +07:00

flatpak

chore: replace md5 with sha256 for CUDA

2025-08-15 16:37:05 +07:00

scripts

chore: separate windows install script

2025-09-24 18:29:03 +07:00

src-tauri

fix: refactor, fix and move gguf support utilities to backend (#6584 )

2025-09-25 12:17:57 +05:30

tests

chore: correct typo in checklist

2025-09-17 10:20:09 +07:00

web-app

fix: refactor, fix and move gguf support utilities to backend (#6584 )

2025-09-25 12:17:57 +05:30

.dockerignore

255: Cloud native

2023-10-30 23:20:10 +07:00

.gitignore

refactor: clean up empty folders (#6454 )

2025-09-15 10:27:07 +07:00

.prettierignore

fix: model path backward compatible (#2018 )

2024-02-14 23:04:46 +07:00

.prettierrc

fix: model path backward compatible (#2018 )

2024-02-14 23:04:46 +07:00

.yarnrc.yml

chore: upgrade to turbo v2 and reduce ci quality gate runtime (#4324 )

2024-12-29 17:46:15 +07:00

CONTRIBUTING.md

Add contributing section for jan (#6231 ) (#6232 )

2025-08-20 10:18:35 +07:00

demo.gif

docs: Update README.md (#1248 )

2023-12-29 11:30:16 +07:00

Dockerfile

chore: add ci for web stag (#6550 )

2025-09-23 01:58:48 +07:00

JanBanner.png

Add files via upload

2024-10-28 23:09:25 +07:00

LICENSE

chore: bundle license to app

2025-08-26 11:17:59 +07:00

Makefile

fix: window dependencies not downloaded during tests

2025-09-24 19:02:18 +07:00

mise.toml

chore: remove unused task

2025-09-18 21:56:09 +07:00

nginx.conf

chore: update Dockerfile to use custom nginx.conf

2025-09-05 17:26:22 +07:00

package.json

chore: separate windows install script

2025-09-24 18:29:03 +07:00

README.md

Backend Architecture Refactoring (#6094 ) (#6162 )

2025-08-15 08:59:01 +07:00

vitest.config.ts

test: add missing unit tests

2025-07-15 22:29:28 +07:00

yarn.lock

feat: support .zip archives for manual backend install (#6534 )

2025-09-23 18:02:06 +05:30

README.md

Jan - Local AI Assistant

GitHub commit activity Github Last Commit Github Contributors GitHub closed issues Discord

Getting Started - Docs - Changelog - Bug reports - Discord

Jan is an AI assistant that can run 100% offline on your device. Download and run LLMs with full control and privacy.

Installation

The easiest way to get started is by downloading one of the following versions for your respective operating system:

Platform	Stable	Nightly
Windows	jan.exe	jan.exe
macOS	jan.dmg	jan.dmg
Linux (deb)	jan.deb	jan.deb
Linux (AppImage)	jan.AppImage	jan.AppImage

Download from jan.ai or GitHub Releases.

Features

Local AI Models: Download and run LLMs (Llama, Gemma, Qwen, etc.) from HuggingFace
Cloud Integration: Connect to OpenAI, Anthropic, Mistral, Groq, and others
Custom Assistants: Create specialized AI assistants for your tasks
OpenAI-Compatible API: Local server at localhost:1337 for other applications
Model Context Protocol: MCP integration for enhanced capabilities
Privacy First: Everything runs locally when you want it to

Build from Source

For those who enjoy the scenic route:

Prerequisites

Node.js ≥ 20.0.0
Yarn ≥ 1.22.0
Make ≥ 3.81
Rust (for Tauri)

Run with Make

git clone https://github.com/menloresearch/jan
cd jan
make dev

This handles everything: installs dependencies, builds core components, and launches the app.

Available make targets:

make dev - Full development setup and launch
make build - Production build
make test - Run tests and linting
make clean - Delete everything and start fresh

Run with Mise (easier)

You can also run with mise, which is a bit easier as it ensures Node.js, Rust, and other dependency versions are automatically managed:

git clone https://github.com/menloresearch/jan
cd jan

# Install mise (if not already installed)
curl https://mise.run | sh

# Install tools and start development
mise install    # installs Node.js, Rust, and other tools
mise dev        # runs the full development setup

Available mise commands:

mise dev - Full development setup and launch
mise build - Production build
mise test - Run tests and linting
mise clean - Delete everything and start fresh
mise tasks - List all available tasks

Manual Commands

yarn install
yarn build:tauri:plugin:api
yarn build:core
yarn build:extensions
yarn dev

System Requirements

Minimum specs for a decent experience:

macOS: 13.6+ (8GB RAM for 3B models, 16GB for 7B, 32GB for 13B)
Windows: 10+ with GPU support for NVIDIA/AMD/Intel Arc
Linux: Most distributions work, GPU acceleration available

For detailed compatibility, check our installation guides.

Troubleshooting

If things go sideways:

Check our troubleshooting docs
Copy your error logs and system specs
Ask for help in our Discord #🆘|jan-help channel

Contributing

Contributions welcome. See CONTRIBUTING.md for the full spiel.

Contact

Bugs: GitHub Issues
Business: hello@jan.ai
Jobs: hr@jan.ai
General Discussion: Discord

License

Apache 2.0 - Because sharing is caring.

Acknowledgements

Built on the shoulders of giants:

Languages

TypeScript 54.9%

JavaScript 34.1%

Rust 8.6%

Python 1.5%

Shell 0.4%

Other 0.5%

README.md

Jan - Local AI Assistant

Installation

Features

Build from Source

Prerequisites

Run with Make

Run with Mise (easier)

Manual Commands

System Requirements

Troubleshooting

Contributing

Links

Contact

License

Acknowledgements