This icon doesn't do anything on chatInput but just an indicator when the proactive capability is activated. Safely remove since this can be indicated from the model dropdown
This commit introduces Japanese as a supported language in the web application.
Key changes include:
- Addition of a new `ja` locale with 15 translated JSON resource files, making the application accessible to Japanese-speaking users.
- Update of the `LanguageSwitcher.tsx` component to include '日本語' in the language selection dropdown menu, allowing users to switch to the new language.
- The localization files were added by creating a new `ja` directory under `web-app/src/locales` and translating the content from the `en` directory.
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* feat: support multimodal tool results and improve tool message handling
- Added a temporary `ToolResult` type that mirrors the structure returned by tools (text, image data, URLs, errors).
- Implemented `convertToolPartToApiContentPart` to translate each tool output part into the format expected by the OpenAI chat completion API.
- Updated `CompletionMessagesBuilder.addToolMessage` to accept a full `ToolResult` instead of a plain string and to:
- Detect multimodal content (base64 images, image URLs) and build a structured `content` array.
- Properly handle plain‑text results, tool execution errors, and unexpected formats with sensible fallbacks.
- Cast the final content to `any` for the `tool` role as required by the API.
- Modified `postMessageProcessing` to pass the raw tool result (`result as any`) to `addToolMessage`, avoiding premature extraction of only the first text part.
- Refactored several formatting and type‑annotation sections:
- Added multiline guard for empty user messages to insert a placeholder.
- Split the image URL construction into a clearer multiline object.
- Adjusted method signatures and added minor line‑breaks for readability.
- Included extensive comments explaining the new logic and edge‑case handling.
These changes enable the chat system to handle richer tool outputs (e.g., images, mixed content) and provide more robust error handling.
* Satisfy ts linter
* Make ts linter happy x2
* chore: update test message creation
---------
Co-authored-by: Faisal Amir <urmauur@gmail.com>
This commit introduces a change to prevent **Markdown** rendering issues where a dollar sign followed by a number (like **`$1`**) is incorrectly interpreted as **LaTeX** by the rendering engine.
---
The `normalizeLatex` function in `RenderMarkdown.tsx` now explicitly escapes these sequences (e.g., **`$1`** becomes **`\$1`**), ensuring they are displayed literally instead of being processed as mathematical expressions. This improves the fidelity of text that might contain currency or similar numerical notations.
* enable new prompt input while waiting for an answer
* correct spelling of handleSendMessage function
* remove test for disabling input while streaming content
- Added shallow equality guard for `connectedServers` state to prevent redundant updates when the fetched server list hasn't changed.
- Updated error handling for server fetch to only clear the state when it actually contains data.
- Introduced `newHasActiveModels` variable and conditional updater for `hasActiveModels` to avoid unnecessary state changes.
- Adjusted error handling for active model fetch to only set `hasActiveModels` to `false` when the current state differs.
These changes reduce needless re‑renders and improve component performance.
The `listSupportedBackends` function now includes error handling for the `fetchRemoteSupportedBackends` call.
This addresses an issue where an error thrown during the remote fetch (e.g., due to no network connection in offline mode) would prevent the subsequent loading of locally installed or manually provided llama.cpp backends.
The remote backend versions array will now default to empty if the fetch fails, allowing the rest of the backend initialization process to proceed as expected.
This commit introduces a new field, `is_embedding`, to the `SessionInfo` structure to clearly mark sessions running dedicated embedding models.
Key changes:
- Adds `is_embedding` to the `SessionInfo` interface in `AIEngine.ts` and the Rust backend.
- Updates the `loadLlamaModel` command signatures to pass this new flag.
- Modifies the llama.cpp extension's **auto-unload logic** to explicitly **filter out** and **not unload** any currently loaded embedding models when a new text generation model is loaded. This is a critical performance fix to prevent the embedding model (e.g., used for RAG) from being repeatedly reloaded.
Also includes minor code style cleanup/reformatting in `jan-provider-web/provider.ts` for improved readability.
* feat: Add support for llamacpp MoE offloading setting
Introduces the n_cpu_moe configuration setting for the llamacpp provider. This allows users to specify the number of Mixture of Experts (MoE) layers whose weights should be offloaded to the CPU via the --n-cpu-moe flag in llama.cpp.
This is useful for running large MoE models by balancing resource usage, for example, by keeping attention on the GPU and offloading expert FFNs to the CPU.
The changes include:
- Updating the llamacpp-extension to accept and pass the --n-cpu-moe argument.
- Adding the input field to the Model Settings UI (ModelSetting.tsx).
- Including model setting migration logic and bumping the store version to 4.
* remove unused import
* feat: add cpu-moe boolean flag
* chore: remove unused migration cont_batching
* chore: fix migration delete old key and add new one
* chore: fix migration
---------
Co-authored-by: Faisal Amir <urmauur@gmail.com>
* feat: Init mobile app from current Tauri v2 framework
Feat:
- Using Tauri v2 by default
- Add new configuration to initiate mobile app
- Add dependencies needed for mobile build
Test:
- Confirm to be built successfully
- Confirm to keep settings for desktop and build successfully
- Reuse most of components from desktop version
* fix: Fix tests
* feat: Add android target
* fix: Reconfigure and add toolchain to wake up Android app
* fix: Fix parsing datatype inconsistent across platforms
* feat: Adjust UI for mobile res
Feature:
- Adjust homecreen and chatscreen for mobile device
- Fix tests for both FE and BE
Self-test:
- Confirm runnable on both Android and iOS
- Confirm runnable on desktop app
- All test suites passed
- Working with ChatGPT API
* fix: Restore dedupe command
* chore: Adjust paddings to save some space for the top nav bar
* Update web-app/src/routes/index.tsx
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* chore: keep gen icon on Android
* chore: add command to ease the mobile dev
* chore: Separate configuration for android build in release mode
* chore: Shrink the Android app size to minimal, release type
* feat: Disable zoom and setup mobile viewport
* chore: Configure iOS to use the same build mechanic to remove unnecessary plugin
* fix: Remove redundant yarn command for ios dev build
* enhancement: fit mobile layout
* chore: update chatscreen padding
* chore: update checking platform using config isntead navigation agent
* chore: update height of thread detail
* remove gen android
* Update web-app/src/index.css
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* fix: Add frontendDist to ios configuration
* feat: Experiment removing hardware permission
* fix: Android releasable build
* feat: Add dev-android to makefile
* feat: Add dev-ios to makefile for ios development
* refactor(utils): add helper to remove extensions from file paths
* chore: fix Encoded logging
* refactor: safely strip prefix and extensions from filename
* chore: add logging for TauriDialog Service
* Update handbook content with Nextra callout and content improvements
- Convert blockquote to Nextra callout in open-superintelligence.mdx
- Add Edison link and improve content flow
- Refine language for better clarity
* docs: enhance overview page with improved structure and internal linking
- Restructured main content with cleaner formatting
- Added comprehensive internal linking for better navigation
- Improved visual hierarchy and readability
- Enhanced acknowledgements section with better organization
- Updated product suite section with consistent formatting
* Update handbook navigation structure and meta.json files
- Updated handbook/_meta.json to properly organize navigation
- Fixed duplicate entries by removing files that belong in subfolders
- Updated why folder title to 'Why does Jan exist?'
- Cleaned up why/_meta.json with proper titles for Open Superintelligence and Open-Source sections
* docs: fix broken internal links and remove privacy page
- Fix broken links in troubleshooting.mdx pointing to install pages
- Remove privacy.mdx page and update _meta.json navigation
- Update various documentation links for consistency
- Ensure all internal links use proper absolute paths
* Optimize installation pages SEO meta titles and descriptions
✨ SEO Improvements:
- Mac: 'Run AI models locally on your Mac - Jan'
- Linux: 'Run AI models locally on Linux - Jan'
- Windows: 'Run AI models locally on Windows - Jan'
🎯 Meta descriptions now include:
- Target keywords (local AI, LLM, offline, ChatGPT-like)
- Platform-specific details (Apple Silicon, Ubuntu/Debian, Windows 10/11)
- Key benefits (GPU acceleration, privacy, no internet required)
📍 Sidebar navigation titles unchanged - only SEO meta data optimized
* Clean up installation page titles and descriptions
- Revert titles to clean sidebar navigation (Mac, Linux, Windows)
- Improve meta descriptions to be concise but SEO-friendly
- Keep key terms: local AI, offline, GPU acceleration, platform details
* Update README.md
* Update README.md
* trigger PR banner
* docs: update missing redirect links
* enhancement: social media navbar and update menu footer
* Update docs/src/components/Navbar.tsx
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update docs/src/components/Navbar.tsx
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: Apply model name change correctly
# Conflicts:
# web-app/src/lib/utils.ts
* feat: Disable text selection on Toaster
Disable the option to select text on the Toaster to consist the swiping action on the toast in order to dismiss it
* fix: scroll issue padding not re render correctly (#6639)
* trigger PR banner
* Improve FAQ section and content updates for offline ChatGPT alternative post
* Add SEO-optimized Twitter meta titles for installation pages
- Add Twitter meta tags to Windows, Linux, and Mac installation pages
- Optimize meta titles: 'Jan on [Platform]' for better SEO
- Maintain consistent meta descriptions across all platforms
- Keep original page titles unchanged for user experience
* Update content files
- Update tabby server example
- Update troubleshooting documentation
- Update NVIDIA TensorRT-LLM benchmarking post
* Add ChatGPT alternative blog post and update installation docs
* Update ChatGPT alternative blog post
* Rename blog post from chatgpt-alternative-jan.mdx to chatgpt-alternatives.mdx
* feat: add real-time ChatGPT status checker blog post
- Add new blog post: 'is-chatgpt-down-use-jan'
- Create OpenAIStatusChecker React component with real-time data
- Use CORS proxy to fetch live OpenAI status from status.openai.com
- Include SEO-optimized status indicators and error messages
- Add ChatGPT downtime promotion for Jan alternative
- Component features: auto-refresh, fallback handling, dark mode support
* chore: fix typo
* chore: fix failed build
* refactor: deprecate Vulkan external binaries (#6638)
* refactor: deprecate vulkan binary
refactor: clean up vulkan lib
chore: cleanup
chore: clean up
chore: clean up
fix: build
* fix: skip binaries download env
* Update src-tauri/utils/src/system.rs
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update src-tauri/utils/src/system.rs
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat: Add tests for the model displayName modification
* fix: Fix linter error
* fix: Fix nvidia and vulkan after upgrade to be compatible with mobile compiling too
* Fix OG image paths and move images to general folder
- Move OG images from _assets/ to public/assets/images/general/
- Update all blog post references to use correct paths
- Remove duplicate images from _assets/ folder
- Fix image paths in content to use /assets/images/general/ format
- Update Twitter image references to match OG image paths
Files updated:
- chatgpt-alternatives.mdx
- deepresearch.mdx
- deepseek-r1-locally.mdx
- how-we-benchmark-kernels.mdx
- is-chatgpt-down-use-jan.mdx
- offline-chatgpt-alternative.mdx
- qwen3-settings.mdx
- run-ai-models-locally.mdx
- run-gpt-oss-locally.mdx
* fix: remove Jan prefix from blog post titles for better SEO
- Blog posts now use only frontmatter title without 'Jan -' prefix
- Other pages maintain existing branding (Jan Desktop, Jan Server, Jan)
- Improves SEO for blog content while preserving site branding
* update blog post content
* Feat: web temporary chat (#6650)
* temporray chat stage1
* temporary page in root
* temporary chat
* handle redirection properly
`
* temporary chat header
* Update extensions-web/src/conversational-web/extension.ts
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* update routetree
* better error handling
* fix strecthed assitant on desktop
* update yarn link to workspace for better link consistency
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Add Guides category to blog navigation
- Add 'guides' category to staticCategories array in Blog component
- Update plopfile.js to include guides in category choices
- Add guides category entry to _meta.json
- Position guides category after research in navigation
* docs: update redirect links
* Update AI for Law blog post with images and content improvements
- Add hero image for AI for Law blog post
- Add images to assistant creation and contract review sections
- Improve content structure and readability
- Add proper image assets for legal AI use cases
- Update ogImage and twitter image references
* fix: revert the modification of vulkan
* Add AI for Teachers blog post with images and video
- Create comprehensive AI for Teachers blog post
- Add hero image and assistant creation interface images
- Include video demonstration of Jan for teachers
- Add proper ogImage and twitter image references
- Cover lesson planning, grading, parent communication, and classroom resources
- Focus on privacy and offline AI for educational use
* fix: Fix linter and tests
* fix: Restore default permission on desktop build
Restore desktop capabilities
Restore linter correctness
Restore different capabilities on each platform
* fix: Fix cargo test
* feat: web add search button for extension (#6671)
* add search button for web extension
* change button color and behavior
* Update extensions-web/src/mcp-web/components/WebSearchButton.tsx
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* add eof new line missing (#6673)
* fix lint issue
* fix: mcp bin path (#6667)
* fix: mcp bin path
* chore: clean up unused structs
* fix: bin name
* fix: tests
* remove test conflict
* add missing closing test
* fix tauri test
* feat: disable all web mcp by default (new users) (#6677)
* fix: chat completion usage - token speed (#6675)
* resolve TypeScript and Rust warnings (#6612)
* chore: fix warnings
* fix: add missing scrollContainerRef dependencies to React hooks
* fix: typo
* fix: remove unsupported fetch option and enable AsyncIterable types
- Removed `connectTimeout` from fetch init (not supported in RequestInit)
- Updated tsconfig to target ES2018
* chore: refactor rename
* fix(hooks): update dependency arrays for useThreadScrolling effects
* Add type.d.ts to extend requestinit with connectionTimeout
* remove commentd unused import
* fix: Fix editing model without saving should restore original name
* fix: thread item overfetching (#6699)
* fix: thread item overfetching
* chore: cleanup left over import
* feat: improve projects (#6698)
* decouple successfully
* only show movable projects for project items
* handle delete covnersations when projects is removed
* fix leftpanel assignemtn
* fix lint
* fix gg tag (#6702)
* refactor: resolve rust analyzer warnings and improve code quality (#6696)
- Update string formatting to use modern interpolation syntax
- Simplify expressions and remove unnecessary intermediate variables
- Improve logging statements for better readability
- Clean up code across core modules (app, downloads, mcp, server, etc.)
* docs: add Jan v0.7.0 changelog
* docs: update Jan v0.7.0 changelog content
* docs: rename changelog file to remove trailing dash
* feat: use sql for mobile storage
* feat: organize code for proper import
Move platform checker for db access to helper
Add test for to threads controller
* feat: better structure for MobileCoreService
MobileCoreService should inherit TauriCoreService to match Tauri architecture patterns
* fix: Extract model capabilities correctly for various providers on various platforms
* fix: yarn lint
* ci: remove upload msi
* fix: extensions missing on Unix dev (#6724)
* fix: extensions missing on Unix dev
* re add bun uv for mcp
* fix: Local API Server - disable settings on run (#6707)
* fix: Fix tests in threads with proper mock folder properly
* changelog: release 0.7.1
* chore: wrong version in detail changelog
* fix: update detail changelog 0.7.1
* fix(ui): restore missing border on model selector (#6692)
* fix: Fix openssl issue on mobile after merging
* fix: Remove yarn.lock changes
* chore(ui): refine className for dropdown menu with animation states
* chore: Reposition 'Remove project' option for better usability
* feat: add project search and scrollable thread lists
- Add search bar to filter projects by name in real-time
- Implement scrollable thread container with max 4 visible threads
- Add empty state for no search results
- Add clear button (X) to reset search query
* (chore): rename translation keys to collapseProject/expandProject
* Fix Translation changes across locales
* (chore): remove duplicate keys from de-DE/common.json
* Add SearchProjects to missing locales
* fix: theme native system and check os support blur
* fix: new window theme
* fix: open new window theme
* fix: test use case appearance
* chore: fix window type theme service
* chore: update permission windows
* chore: fix desktop capabilities
* chore: check support blur using hardware api
* chore: check support blur on FE
* chore: fix new chat with update last selected model dropdown
* fix: tittle recent when no result found
---------
Co-authored-by: Vanalite <dhnghia0604@gmail.com>
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
Co-authored-by: Roushan Singh <github.rtron18@gmail.com>
Co-authored-by: eckartal <emre@jan.ai>
Co-authored-by: Emre Can Kartal <159995642+eckartal@users.noreply.github.com>
Co-authored-by: Roushan Kumar Singh <158602016+github-roushan@users.noreply.github.com>
Co-authored-by: Nguyen Ngoc Minh <91668012+Minh141120@users.noreply.github.com>
Co-authored-by: Minh141120 <minh.itptit@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Dinh Long Nguyen <dinhlongviolin1@gmail.com>
- Add search bar to filter projects by name in real-time
- Implement scrollable thread container with max 4 visible threads
- Add empty state for no search results
- Add clear button (X) to reset search query
* feat: Adjust RAM/VRAM calculation for unified memory systems
This commit refactors the logic for calculating **total RAM** and **total VRAM** in `is_model_supported` and `plan_model_load` commands, specifically targeting systems with **unified memory** (like modern macOS devices where the GPU list may be empty).
The changes are as follows:
* **Total RAM Calculation:** If no GPUs are detected (`sys_info.gpus.is_empty()` is true), **total RAM** is now set to $0$. This avoids confusing total system memory with dedicated GPU memory when planning model placement.
* **Total VRAM Calculation:** If no GPUs are detected, **total VRAM** is still calculated as the system's **total memory (RAM)**, as this shared memory acts as VRAM on unified memory architectures.
This adjustment improves the accuracy of memory availability checks and model planning on unified memory systems.
* fix: total usable memory in case there is no system vram reported
* chore: temporarily change to self-hosted runner mac
* ci: revert back to github hosted runner macos
---------
Co-authored-by: Louis <louis@jan.ai>
Co-authored-by: Minh141120 <minh.itptit@gmail.com>
- Create comprehensive AI for Teachers blog post
- Add hero image and assistant creation interface images
- Include video demonstration of Jan for teachers
- Add proper ogImage and twitter image references
- Cover lesson planning, grading, parent communication, and classroom resources
- Focus on privacy and offline AI for educational use
- Add hero image for AI for Law blog post
- Add images to assistant creation and contract review sections
- Improve content structure and readability
- Add proper image assets for legal AI use cases
- Update ogImage and twitter image references
The KV cache size calculation in estimate_kv_cache_internal now includes a fallback mechanism for models that do not explicitly define key_length and value_length in the GGUF metadata.
If these attention keys are missing, the head dimension (and thus key/value length) is calculated using the formula embedding_length / total_heads. This improves robustness and compatibility with GGUF models that don't have the proper keys in metadata.
Also adds logging of the full model metadata for easier debugging of the estimation process.
- Add 'guides' category to staticCategories array in Blog component
- Update plopfile.js to include guides in category choices
- Add guides category entry to _meta.json
- Position guides category after research in navigation
- Blog posts now use only frontmatter title without 'Jan -' prefix
- Other pages maintain existing branding (Jan Desktop, Jan Server, Jan)
- Improves SEO for blog content while preserving site branding
- Move OG images from _assets/ to public/assets/images/general/
- Update all blog post references to use correct paths
- Remove duplicate images from _assets/ folder
- Fix image paths in content to use /assets/images/general/ format
- Update Twitter image references to match OG image paths
Files updated:
- chatgpt-alternatives.mdx
- deepresearch.mdx
- deepseek-r1-locally.mdx
- how-we-benchmark-kernels.mdx
- is-chatgpt-down-use-jan.mdx
- offline-chatgpt-alternative.mdx
- qwen3-settings.mdx
- run-ai-models-locally.mdx
- run-gpt-oss-locally.mdx
- Add new blog post: 'is-chatgpt-down-use-jan'
- Create OpenAIStatusChecker React component with real-time data
- Use CORS proxy to fetch live OpenAI status from status.openai.com
- Include SEO-optimized status indicators and error messages
- Add ChatGPT downtime promotion for Jan alternative
- Component features: auto-refresh, fallback handling, dark mode support
- Add Twitter meta tags to Windows, Linux, and Mac installation pages
- Optimize meta titles: 'Jan on [Platform]' for better SEO
- Maintain consistent meta descriptions across all platforms
- Keep original page titles unchanged for user experience
- Revert titles to clean sidebar navigation (Mac, Linux, Windows)
- Improve meta descriptions to be concise but SEO-friendly
- Keep key terms: local AI, offline, GPU acceleration, platform details
✨ SEO Improvements:
- Mac: 'Run AI models locally on your Mac - Jan'
- Linux: 'Run AI models locally on Linux - Jan'
- Windows: 'Run AI models locally on Windows - Jan'
🎯 Meta descriptions now include:
- Target keywords (local AI, LLM, offline, ChatGPT-like)
- Platform-specific details (Apple Silicon, Ubuntu/Debian, Windows 10/11)
- Key benefits (GPU acceleration, privacy, no internet required)
📍 Sidebar navigation titles unchanged - only SEO meta data optimized
- Fix broken links in troubleshooting.mdx pointing to install pages
- Remove privacy.mdx page and update _meta.json navigation
- Update various documentation links for consistency
- Ensure all internal links use proper absolute paths
- Updated handbook/_meta.json to properly organize navigation
- Fixed duplicate entries by removing files that belong in subfolders
- Updated why folder title to 'Why does Jan exist?'
- Cleaned up why/_meta.json with proper titles for Open Superintelligence and Open-Source sections
* feat: add field edit model name
* fix: update model
* chore: updaet UI form with save button, and handle edit capabilities and rename folder will need save button
* fix: relocate model
* chore: update and refresh list model provider also update test case
* chore: state loader
* fix: model path
* fix: model config update
* chore: fix remove depencies provider on edit model dialog
* chore: avoid shifted model name or id
---------
Co-authored-by: Louis <louis@jan.ai>
* feat: move estimateKVCacheSize to BE
* feat: Migrate model planning to backend
This commit migrates the model load planning logic from the frontend to the Tauri backend. This refactors the `planModelLoad` and `isModelSupported` methods into the `tauri-plugin-llamacpp` plugin, making them directly callable from the Rust core.
The model planning now incorporates a more robust and accurate memory estimation, considering both VRAM and system RAM, and introduces a `batch_size` parameter to the model plan.
**Key changes:**
- **Moved `planModelLoad` to `tauri-plugin-llamacpp`:** The core logic for determining GPU layers, context length, and memory offloading is now in Rust for better performance and accuracy.
- **Moved `isModelSupported` to `tauri-plugin-llamacpp`:** The model support check is also now handled by the backend.
- **Removed `getChatClient` from `AIEngine`:** This optional method was not implemented and has been removed from the abstract class.
- **Improved KV Cache estimation:** The `estimate_kv_cache_internal` function in Rust now accounts for `attention.key_length` and `attention.value_length` if available, and considers sliding window attention for more precise estimates.
- **Introduced `batch_size` in ModelPlan:** The model plan now includes a `batch_size` property, which will be automatically adjusted based on the determined `ModelMode` (e.g., lower for CPU/Hybrid modes).
- **Updated `llamacpp-extension`:** The frontend extension now calls the new Tauri commands for model planning and support checks.
- **Removed `batch_size` from `llamacpp-extension/settings.json`:** The batch size is now dynamically determined by the planning logic and will be set as a model setting directly.
- **Updated `ModelSetting` and `useModelProvider` hooks:** These now handle the new `batch_size` property in model settings.
- **Added new Tauri commands and permissions:** `get_model_size`, `is_model_supported`, and `plan_model_load` are new commands with corresponding permissions.
- **Consolidated `ModelSupportStatus` and `KVCacheEstimate`:** These types are now defined in `src/tauri/plugins/tauri-plugin-llamacpp/src/gguf/types.rs`.
This refactoring centralizes critical model resource management logic, improving consistency and maintainability, and lays the groundwork for more sophisticated model loading strategies.
* feat: refine model planner to handle more memory scenarios
This commit introduces several improvements to the `plan_model_load` function, enhancing its ability to determine a suitable model loading strategy based on system memory constraints. Specifically, it includes:
- **VRAM calculation improvements:** Corrects the calculation of total VRAM by iterating over GPUs and multiplying by 1024*1024, improving accuracy.
- **Hybrid plan optimization:** Implements a more robust hybrid plan strategy, iterating through GPU layer configurations to find the highest possible GPU usage while remaining within VRAM limits.
- **Minimum context length enforcement:** Enforces a minimum context length for the model, ensuring that the model can be loaded and used effectively.
- **Fallback to CPU mode:** If a hybrid plan isn't feasible, it now correctly falls back to a CPU-only mode.
- **Improved logging:** Enhanced logging to provide more detailed information about the memory planning process, including VRAM, RAM, and GPU layers.
- **Batch size adjustment:** Updated batch size based on the selected mode, ensuring efficient utilization of available resources.
- **Error handling and edge cases:** Improved error handling and edge case management to prevent unexpected failures.
- **Constants:** Added constants for easier maintenance and understanding.
- **Power-of-2 adjustment:** Added power of 2 adjustment for max context length to ensure correct sizing for the LLM.
These changes improve the reliability and robustness of the model planning process, allowing it to handle a wider range of hardware configurations and model sizes.
* Add log for raw GPU info from tauri-plugin-hardware
* chore: update linux runner for tauri build
* feat: Improve GPU memory calculation for unified memory
This commit improves the logic for calculating usable VRAM, particularly for systems with **unified memory** like Apple Silicon. Previously, the application would report 0 total VRAM if no dedicated GPUs were found, leading to incorrect calculations and failed model loads.
This change modifies the VRAM calculation to fall back to the total system RAM if no discrete GPUs are detected. This is a common and correct approach for unified memory architectures, where the CPU and GPU share the same memory pool.
Additionally, this commit refactors the logic for calculating usable VRAM and RAM to prevent potential underflow by checking if the total memory is greater than the reserved bytes before subtracting. This ensures the calculation remains safe and correct.
* chore: fix update migration version
* fix: enable unified memory support on model support indicator
* Use total_system_memory in bytes
---------
Co-authored-by: Minh141120 <minh.itptit@gmail.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
* fix: allow users to download the same model from different authors
* fix: importing models should have author name in the ID
* fix: incorrect model id show
* fix: tests
* fix: default to mmproj f16 instead of bf16
* fix: type
* fix: build error
* feat: normalize LaTeX fragments in markdown rendering
Added a preprocessing step that converts LaTeX delimiters `\[…\]` to `$$…$$` and `\(...\)` to `$…$` before rendering. The function skips code blocks, inline code, and HTML tags to avoid unintended transformations. This improves authoring experience by supporting common LaTeX syntax without requiring explicit `$` delimiters.
* fix: correct inline LaTeX normalization replacement
The replacement function for inline math (`\(...\)`) incorrectly accepted a fourth
parameter (`post`) and appended it to the result, which could introduce stray
characters or `undefined` into the rendered output. Updated the function to
use only the captured prefix and inner content and removed the extraneous
`${post}` interpolation, ensuring clean LaTeX conversion.
* feat: optimize markdown rendering with LaTeX caching and memoized code blocks
- Added cache to normalizeLatex to avoid reprocessing repeated content
- Introduced CodeComponent with stable IDs and memoization to reduce re-renders
- Replaced per-render code block ID mapping with hash-based IDs
- Memoized copy handler and normalized markdown content
- Simplified plugin/component setup with stable references
- Added custom comparison for RenderMarkdown memoization to prevent unnecessary updates
* refactor: memoize content only
---------
Co-authored-by: Louis <louis@jan.ai>
* feat: implement conversation endpoint
* use conversation aware endpoint
* fetch message correctly
* preserve first message
* fix logout
* fix broadcast issue locally + auth not refreshing profile on other tabs+ clean up and sync messages
* add is dev tag
* feat: add getTokensCount method to compute token usage
Implemented a new async `getTokensCount` function in the LLaMA.cpp extension.
The method validates the model session, checks process health, applies the request template, and tokenizes the resulting prompt to return the token count. Includes detailed error handling for crashed models and API failures, enabling callers to assess token usage before sending completions.
* Fix: typos
* chore: update ui token usage
* chore: remove unused code
* feat: add image token handling for multimodal LlamaCPP models
Implemented support for counting image tokens when using vision-enabled models:
- Extended `SessionInfo` with optional `mmprojPath` to store the multimodal project file.
- Propagated `mmproj_path` from the Tauri plugin into the session info.
- Added import of `chatCompletionRequestMessage` and enhanced token calculation logic in the LlamaCPP extension:
- Detects image content in messages.
- Reads GGUF metadata from `mmprojPath` to compute accurate image token counts.
- Provides a fallback estimation if metadata reading fails.
- Returns the sum of text and image tokens.
- Introduced helper methods `calculateImageTokens` and `estimateImageTokensFallback`.
- Minor clean‑ups such as comment capitalization and debug logging.
* chore: update FE send params message include content type image_url
* fix mmproj path from session info and num tokens calculation
* fix: Correct image token estimation calculation in llamacpp extension
This commit addresses an inaccurate token count for images in the llama.cpp extension.
The previous logic incorrectly calculated the token count based on image patch size and dimensions. This has been replaced with a more precise method that uses the clip.vision.projection_dim value from the model metadata.
Additionally, unnecessary debug logging was removed, and a new log was added to show the mmproj metadata for improved visibility.
* fix per image calc
* fix: crash due to force unwrap
---------
Co-authored-by: Faisal Amir <urmauur@gmail.com>
Co-authored-by: Louis <louis@jan.ai>
* feat: Prompt progress when streaming
- BE changes:
- Add a `return_progress` flag to `chatCompletionRequest` and a corresponding `prompt_progress` payload in `chatCompletionChunk`. Introduce `chatCompletionPromptProgress` interface to capture cache, processed, time, and total token counts.
- Update the Llamacpp extension to always request progress data when streaming, enabling UI components to display real‑time generation progress and leverage llama.cpp’s built‑in progress reporting.
* Make return_progress optional
* chore: update ui prompt progress before streaming content
* chore: remove log
* chore: remove progress when percentage >= 100
* chore: set timeout prompt progress
* chore: move prompt progress outside streaming content
* fix: tests
---------
Co-authored-by: Faisal Amir <urmauur@gmail.com>
Co-authored-by: Louis <louis@jan.ai>
* feat: Add Jan API server Swagger UI
- Serve OpenAPI spec (`static/openapi.json`) directly from the proxy server.
- Implement Swagger UI assets (`swagger-ui.css`, `swagger-ui-bundle.js`, `favicon.ico`) and a simple HTML wrapper under `/docs`.
- Extend the proxy whitelist to include Swagger UI routes.
- Add routing logic for `/openapi.json`, `/docs`, and Swagger UI static files.
- Update whitelisted paths and integrate CORS handling for the new endpoints.
* feat: serve Swagger UI at root path
The Swagger UI endpoint previously lived under `/docs`. The route handling and
exclusion list have been updated so the UI is now served directly at `/`.
This simplifies access, aligns with the expected root URL in the Tauri
frontend, and removes the now‑unused `/docs` path handling.
* feat: add model loading state and translations for local API server
Implemented a loading indicator for model startup, updated the start/stop button to reflect model loading and server starting states, and disabled interactions while pending. Added new translation keys (`loadingModel`, `startingServer`) across all supported locales (en, de, id, pl, vn, zh-CN, zh-TW) and integrated them into the UI. Included a small delay after model start to ensure backend state consistency. This improves user feedback and prevents race conditions during server initialization.
The previous implementation mixed model size and VRAM checks, leading to inaccurate status reporting (e.g., false RED results).
- Simplified import statement for `readGgufMetadata`.
- Fixed RAM/VRAM comparison by removing unnecessary parentheses.
- Replaced ambiguous `modelSize > usableTotalMemory` check with a clear `totalRequired > usableTotalMemory` hard‑limit condition.
- Refactored the status logic to explicitly handle the CPU‑GPU hybrid scenario, returning **YELLOW** when the total requirement fits combined memory but exceeds VRAM.
- Updated comments for better readability and maintenance.
* feat: system tray icon build flag
* fix: missing tray feature for testing
* fix: failed tests
* fix: left click should not show menu
* ci: add system dependencies for appindicator lib on Linux
---------
Co-authored-by: Minh141120 <minh.itptit@gmail.com>
Feature:
- Adjust homecreen and chatscreen for mobile device
- Fix tests for both FE and BE
Self-test:
- Confirm runnable on both Android and iOS
- Confirm runnable on desktop app
- All test suites passed
- Working with ChatGPT API
Feat:
- Using Tauri v2 by default
- Add new configuration to initiate mobile app
- Add dependencies needed for mobile build
Test:
- Confirm to be built successfully
- Confirm to keep settings for desktop and build successfully
- Reuse most of components from desktop version
The Llama.cpp backend can emit the phrase “failed to allocate” when it runs out of memory.
Adding this check ensures such messages are correctly classified as out‑of‑memory errors,
providing more accurate error handling CPU backends.
- Removed the unused `getKVCachePerToken` helper and replaced it with a unified `estimateKVCache` that returns both total size and per‑token size.
- Fixed the KV cache size calculation to account for all layers, correcting previous under‑estimation.
- Added proper clamping of user‑requested context lengths to the model’s maximum.
- Refactored VRAM budgeting: introduced explicit reserves, fixed engine overhead, and separate multipliers for VRAM and system RAM based on memory mode.
- Implemented a more robust planning flow with clear GPU, Hybrid, and CPU pathways, including fallback configurations when resources are insufficient.
- Updated default context length handling and safety buffers to prevent OOM situations.
- Adjusted usable memory percentage to 90 % and refined logging for easier debugging.
* fix: correct context shift flag handling in LlamaCPP extension
The previous implementation added the `--no-context-shift` flag when `cfg.ctx_shift` was disabled, which conflicted with the llama.cpp CLI where the presence of `--context-shift` enables the feature.
The logic is updated to push `--context-shift` only when `cfg.ctx_shift` is true, ensuring the extension passes the correct argument and behaves as expected.
* feat: detect model out of context during generation
---------
Co-authored-by: Dinh Long Nguyen <dinhlongviolin1@gmail.com>
Implemented `isAbsolutePath` helper to correctly identify POSIX, Windows drive‑letter, and UNC absolute paths. Updated `planModelLoad` to automatically resolve relative model and mmproj paths against the Jan data folder, enhancing usability for users supplying non‑absolute paths. Also refined minor formatting for readability.
Expose the new `mmproj_offload` option in the model settings UI and include it in the `ModelPlan` type. The component now collects the offload flag (`result.offloadMmproj`) and queues it with other setting updates to ensure a single atomic change, preventing race conditions when toggling this feature. This enables users to control MMProj offloading directly from the app.
The flag `noOffloadMmproj` was misleading – it actually indicates when the mmproj file **is** offloaded to VRAM. Renaming it to `offloadMmproj` clarifies its purpose and aligns the naming with the surrounding code.
Additionally, the `planModelLoad` signature has been reordered to place `mmprojPath` before `requestedCtx`, improving readability and making the optional parameters more intuitive. All related logic, calculations, and log messages have been updated to use the new flag name.
The original calculation used only the `block_count` from the model metadata, which excludes the final LM head and the embedding layer. This caused an underestimation of the total number of layers and consequently an incorrect `layerSize` value. Adding `+2` accounts for these two missing layers, ensuring accurate model size metrics.
Added a GPU memory check using `getSystemInfo` to ensure Vulkan is selected only on systems with at least 6 GB of VRAM.
* Made `determineBestBackend` asynchronous and updated all callers to `await` it.
* Adjusted backend priority list to include or demote Vulkan based on the memory check.
* Updated Vulkan support detection in `backend.ts` to rely solely on API version (memory check moved to selection logic).
* Imported `getSystemInfo` and refined file‑existence validation.
These changes prevent sub‑optimal Vulkan usage on low‑memory GPUs and improve backend selection reliability.
- Add `src-tauri/resources/` to `.gitignore`.
- Introduced utilities to read locally installed backends (`getLocalInstalledBackends`) and fetch remote supported backends (`fetchRemoteSupportedBackends`).
- Refactored `listSupportedBackends` to merge remote and local entries with deduplication and proper sorting.
- Exported `getBackendDir` and integrated it into the extension.
- Added helper `parseBackendVersion` and new method `checkBackendForUpdates` to detect newer backend versions.
- Implemented `installBackend` for manual backend archive installation, including platform‑specific binary path handling.
- Updated command‑line argument logic for `--flash-attn` to respect version‑specific defaults.
- Modified Tauri filesystem `decompress` command to remove overly strict path validation.
* feat: Smart model management
* **New UI option** – `memory_util` added to `settings.json` with a dropdown (high / medium / low) to let users control how aggressively the engine uses system memory.
* **Configuration updates** – `LlamacppConfig` now includes `memory_util`; the extension class stores it in a new `memoryMode` property and handles updates through `updateConfig`.
* **System memory handling**
* Introduced `SystemMemory` interface and `getTotalSystemMemory()` to report combined VRAM + RAM.
* Added helper methods `getKVCachePerToken`, `getLayerSize`, and a new `ModelPlan` type.
* **Smart model‑load planner** – `planModelLoad()` computes:
* Number of GPU layers that can fit in usable VRAM.
* Maximum context length based on KV‑cache size and the selected memory utilization mode (high/medium/low).
* Whether KV‑cache must be off‑loaded to CPU and the overall loading mode (GPU, Hybrid, CPU, Unsupported).
* Detailed logging of the planning decision.
* **Improved support check** – `isModelSupported()` now:
* Uses the combined VRAM/RAM totals from `getTotalSystemMemory()`.
* Applies an 80% usable‑memory heuristic.
* Returns **GREEN** only when both weights and KV‑cache fit in VRAM, **YELLOW** when they fit only in total memory or require CPU off‑load, and **RED** when the model cannot fit at all.
* **Cleanup** – Removed unused `GgufMetadata` import; updated imports and type definitions accordingly.
* **Documentation/comments** – Added explanatory JSDoc comments for the new methods and clarified the return semantics of `isModelSupported`.
* chore: migrate no_kv_offload from llamacpp setting to model setting
* chore: add UI auto optimize model setting
* feat: improve model loading planner with mmproj support and smarter memory budgeting
* Extend `ModelPlan` with optional `noOffloadMmproj` flag to indicate when a multimodal projector can stay in VRAM.
* Add `mmprojPath` parameter to `planModelLoad` and calculate its size, attempting to keep it on GPU when possible.
* Refactor system memory detection:
* Use `used_memory` (actual free RAM) instead of total RAM for budgeting.
* Introduced `usableRAM` placeholder for future use.
* Rewrite KV‑cache size calculation:
* Properly handle GQA models via `attention.head_count_kv`.
* Compute bytes per token as `nHeadKV * headDim * 2 * 2 * nLayer`.
* Replace the old 70 % VRAM heuristic with a more flexible budget:
* Reserve a fixed VRAM amount and apply an overhead factor.
* Derive usable system RAM from total memory minus VRAM.
* Implement a robust allocation algorithm:
* Prioritize placing the mmproj in VRAM.
* Search for the best balance of GPU layers and context length.
* Fallback strategies for hybrid and pure‑CPU modes with detailed safety checks.
* Add extensive validation of model size, KV‑cache size, layer size, and memory mode.
* Improve logging throughout the planning process for easier debugging.
* Adjust final plan return shape to include the new `noOffloadMmproj` field.
* remove unused variable
---------
Co-authored-by: Faisal Amir <urmauur@gmail.com>
* call jan api
* fix lint
* ci: add jan server web
* chore: add Dockerfile
* clean up ui ux and support for reasoning fields, make app spa
* add logo
* chore: update tag for preview image
* chore: update k8s service name
* chore: update image tag and image name
* fixed test
---------
Co-authored-by: Minh141120 <minh.itptit@gmail.com>
Co-authored-by: Nguyen Ngoc Minh <91668012+Minh141120@users.noreply.github.com>
* fix: Use 80% total memory for compatibility check
* refactor: extract usable memory percentage to named constant
Extract the hardcoded 0.8 multiplier into a named constant
USABLE_MEMORY_PERCENTAGE for better readability and maintainability.
* fix: check for env value before setting (#6266)
* fix: check for env value before setting
* Use empty instead of none
* fix: update linux build script to be consistent with CI (#6269)
The local build script for Linux was failing due to a bundling error. This commit updates the `build:tauri:linux` script in `package.json` to be consistent with the CI build pipeline, which resolves the issue.
The updated script now includes:
- **`NO_STRIP=1`**: This environment variable prevents the `linuxdeploy` utility from stripping debugging symbols, which was a potential cause of the bundling failure.
- **`--verbose`**: This flag provides more detailed output during the build, which can be useful for debugging similar issues in the future.
* fix: compatibility imported model
* fix: update copy mmproj setting desc
* fix: toggle vision for remote model
* chore: add tooltip visions
* chore: show model setting only for local provider
* fix/update-ui-info
* chore: update filter hub while searching
* fix: system monitor window permission
* chore: update credit description
* chore: bundle license to app
* chore: bundle license to resource mac
* chore: bundle license to linux dist
* chore: relocate LICENSE to resources folder
* fix: handle paste image on linux
* fix: handle copy image from browser in linux
* feat: add regression checklist
* fix: mcp cleanup dropodown tool availabel and sort list
* fix: sort list when add server
* fix: mcp sort list
* fix: handle conditional UI regenerate resp
* fix: code generation more than 300 line
* fix: handle checking compatible gated model
* chore: update test
* chore: remove log
* chore: fix status
* chore: fix status model id
* make validation message infinite (#6318)
---------
Co-authored-by: Akarshan Biswas <akarshan.biswas@gmail.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
Co-authored-by: Minh141120 <minh.itptit@gmail.com>
Co-authored-by: Nguyen Ngoc Minh <91668012+Minh141120@users.noreply.github.com>
* fix: check for env value before setting (#6266)
* fix: check for env value before setting
* Use empty instead of none
* fix: update linux build script to be consistent with CI (#6269)
The local build script for Linux was failing due to a bundling error. This commit updates the `build:tauri:linux` script in `package.json` to be consistent with the CI build pipeline, which resolves the issue.
The updated script now includes:
- **`NO_STRIP=1`**: This environment variable prevents the `linuxdeploy` utility from stripping debugging symbols, which was a potential cause of the bundling failure.
- **`--verbose`**: This flag provides more detailed output during the build, which can be useful for debugging similar issues in the future.
* fix: compatibility imported model
* fix: update copy mmproj setting desc
* fix: toggle vision for remote model
* chore: add tooltip visions
* chore: show model setting only for local provider
* fix/update-ui-info
* chore: update filter hub while searching
* fix: system monitor window permission
* chore: update credit description
---------
Co-authored-by: Akarshan Biswas <akarshan.biswas@gmail.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
Co-authored-by: Minh141120 <minh.itptit@gmail.com>
Co-authored-by: Nguyen Ngoc Minh <91668012+Minh141120@users.noreply.github.com>
- Fix OG image URL to full https://jan.ai/post/_assets/jan-research.jpeg for Twitter preview
- Update publication date to 2025-08-22
- Ensure social media platforms can properly display the image
- Updated title to 'Jan v1 for Deep Research'
- Added professional cookbook-style formatting inspired by OpenAI guide
- Added performance summary with benchmark results (91.1% vs 83.2%)
- Added new OG image (jan-research.jpeg)
- Improved content structure and readability
The local build script for Linux was failing due to a bundling error. This commit updates the `build:tauri:linux` script in `package.json` to be consistent with the CI build pipeline, which resolves the issue.
The updated script now includes:
- **`NO_STRIP=1`**: This environment variable prevents the `linuxdeploy` utility from stripping debugging symbols, which was a potential cause of the bundling failure.
- **`--verbose`**: This flag provides more detailed output during the build, which can be useful for debugging similar issues in the future.
* feat: Add model compatibility check and memory estimation
This commit introduces a new feature to check if a given model is supported based on available device memory.
The change includes:
- A new `estimateKVCache` method that calculates the required memory for the model's KV cache. It uses GGUF metadata such as `block_count`, `head_count`, `key_length`, and `value_length` to perform the calculation.
- An `isModelSupported` method that combines the model file size and the estimated KV cache size to determine the total memory required. It then checks if any available device has sufficient free memory to load the model.
- An updated error message for the `version_backend` check to be more user-friendly, suggesting a stable internet connection as a potential solution for backend setup failures.
This functionality helps prevent the application from attempting to load models that would exceed the device's memory capacity, leading to more stable and predictable behavior.
fixes: #5505
* Update extensions/llamacpp-extension/src/index.ts
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* Update extensions/llamacpp-extension/src/index.ts
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* Extend this to available system RAM if GGML device is not available
* fix: Improve model metadata and memory checks
This commit refactors the logic for checking if a model is supported by a system's available memory.
**Key changes:**
- **Remote model support**: The `read_gguf_metadata` function can now fetch metadata from a remote URL by reading the file in chunks.
- **Improved KV cache size calculation**: The KV cache size is now estimated more accurately by using `attention.key_length` and `attention.value_length` from the GGUF metadata, with a fallback to `embedding_length`.
- **Granular memory check statuses**: The `isModelSupported` function now returns a more specific status (`'RED'`, `'YELLOW'`, `'GREEN'`) to indicate whether the model weights or the KV cache are too large for the available memory.
- **Consolidated logic**: The logic for checking local and remote models has been consolidated into a single `isModelSupported` function, improving code clarity and maintainability.
These changes provide more robust and informative model compatibility checks, especially for models hosted on remote servers.
* Update extensions/llamacpp-extension/src/index.ts
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* Make ctx_size optional and use sum free memory across ggml devices
* feat: hub and dropdown model selection handle model compatibility
* feat: update bage model info color
* chore: enable detail page to get compatibility model
* chore: update copy
* chore: update shrink indicator UI
---------
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
This commit improves the clarity of the llama.cpp extension.
- Corrected a placeholder example from `GGML_VK_VISIBLE_DEVICES='0,1'` to `GGML_VK_VISIBLE_DEVICES=0,1` for better accuracy.
- Changed an ambiguous error message from `"Failed to load llama-server: ${error}"` to the more specific `"Failed to load llamacpp backend"`.
This commit adds a new setting `llamacpp_env` to the llama.cpp extension, allowing users to specify custom environment variables. These variables are passed to the backend process when it starts.
A new function `parseEnvFromString` is introduced to handle the parsing of the semicolon-separated key-value pairs from the user input. The environment variables are then used in the `load` function and when listing available devices. This enables more flexible configuration of the llama.cpp backend, such as specifying visible GPUs for Vulkan.
This change also updates the Tauri command `get_devices` to accept environment variables, ensuring that device discovery respects the user's settings.
This commit adds a GGUF validation check for both the main model file and the `mmproj` file (if present) before they are loaded. This prevents the extension from crashing if an invalid GGUF file is provided.
The `GgufMetadata` interface and `loadMetadata` function were removed as the `readGgufMetadata` is now invoked directly. The code has also been refactored to be more readable, with clearer variable names and more descriptive comments.
- Add userIntendedPositionRef to track user's deliberate scroll position during streaming
- Add wasStreamingRef to detect when streaming finishes
- Preserve user's reading position when they scroll away from bottom during AI response
- Restore intended position with smooth scroll when streaming completes
- Reset position tracking on thread changes and component initialization
Fixes#5939🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
This change modifies how the API key is passed to the llama-server process. Previously, it was sent as a command line argument (--api-key). This approach has been updated to pass the key via an environment variable (LLAMA_API_KEY).
This improves security by preventing the API key from being visible in the process list (ps aux on Linux, Task Manager on Windows, etc.), where it could potentially be exposed to other users or processes on the same system.
The commit also updates the Rust backend to read the API key from the environment variable instead of parsing it from the command line arguments.
This commit introduces a new configuration option offload_mmproj to the llamacpp extension.
The offload_mmproj setting allows users to control whether the multimodal projector model is offloaded to the GPU. By default, it's offloaded for better performance. If set to false, the projector model will remain on the CPU, which can be useful in low GPU memory scenarios, though image processing might take longer.
Additionally, this commit adds validate_mmproj_path to ensure the provided --mmproj path is valid and accessible, preventing issues during model loading.
This change also refactors some invoke calls for improved readability.
- Fix default color selection logic in all color picker components
- Use existing helper functions (isDefaultColor, isDefaultColorPrimary, etc.)
- Ensure default colors are properly highlighted when active
- Apply fix to all color pickers: primary, accent, destructive, main view, and background
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* feat: Add GGUF metadata reading functionality
This commit introduces a new Tauri command and a corresponding function to read metadata from GGUF model files.
The new read_gguf_metadata command in the Rust backend uses the byteorder crate to parse the GGUF file format and extract key metadata. This information, including the file's version, tensor count, and a key-value map of other metadata, is then made available to the TypeScript frontend.
This functionality is a foundational step toward providing users with more detailed information about their loaded models directly within the application.
This will be refactored later.
fixes: #6001
* loadMetadata() should return
* Properly throw eror to FE
* Use BufReader to improve performance
* fix: Improve error message for invalid version/backend format
This commit changes the error message displayed when the `version_backend` configuration is invalid. The new message is more user-friendly and suggests a simple solution, such as restarting the application, which is more helpful to the user than the previous technical error message.
* fix typo
* fix: HF token is not used while searching repositories
* chore: whitelist jan model with tool use support by default
* fix: tests
* fix: duplicate model while searching
* fix: deprecate addSource tests since the function was removed
* feat: Introduce structured error handling for llamacpp extension
This commit introduces a structured error handling system for the `llamacpp` extension. Instead of returning simple string errors, we now use a custom `LlamacppError` struct with a specific `ErrorCode` enum. This allows the frontend to display more user-friendly and actionable error messages based on the code, rather than raw debug logs.
The changes include:
- A new `ErrorCode` enum to categorize errors (e.g., `OutOfMemory`, `ModelArchNotSupported`, `BinaryNotFound`).
- A `LlamacppError` struct to encapsulate the code, a user-facing message, and optional detailed logs.
- A static method `from_stderr` that intelligently parses llama.cpp's standard error output to identify and map common issues like Out of Memory errors to a specific error code.
- Refactored `ServerError` enum to wrap the new `LlamacppError` and provide a consistent serialization format for the Tauri frontend.
- Updated all relevant functions (`load_llama_model`, `get_devices`) to return the new structured error type, ensuring a more robust and predictable error flow.
- A reduced timeout for model loading from 300 to 180 seconds.
This work lays the groundwork for a more intuitive and helpful user experience, as the application can now provide clear guidance to users when a model fails to load.
* Update src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* Update src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* chore: update FE handle error object from extension
* chore: fix property type
---------
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
* refactor: Use more precise terminology in API server logs and error messages
This commit refactors several log and error messages to use more accurate and consistent terminology.
- Replaced "backend servers" and "backend model servers" with "models" or "sessions" to better reflect the service's internal structure.
- Changed "Proxy server" to "Jan API server" to more accurately describe the server's function.
- Removed a redundant debug log message.
These changes are cosmetic and improve the readability and consistency of the logging output.
* Update src-tauri/src/core/server.rs
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
---------
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* refactor: move session management & port allocation to backend
- Remove the in‑process `activeSessions` map and its cleanup logic from the TypeScript side.
- Introduce new Tauri commands in Rust:
- `get_random_port` – picks an unused port using a seeded RNG and checks availability.
- `find_session_by_model` – returns the `SessionInfo` for a given model ID.
- `get_loaded_models` – returns a list of currently loaded model IDs.
- Update the extension’s TypeScript code to use these commands via `invoke`:
- `findSessionByModel`, `load`, `unload`, `chat`, `getLoadedModels`, and `embed` now operate asynchronously and query the backend.
- Remove the old `is_port_available` command and the custom port‑checking loop.
- Simplify `onUnload` – session termination is now handled by the backend.
- Drop unused helpers (`sleep`, `waitForModelLoad`) and related port‑availability code.
- Add missing Rust imports (`rand::{StdRng,Rng,SeedableRng}`, `HashSet`) and improve error handling.
- Register the new commands in `src-tauri/src/lib.rs` (replace `is_port_available` with the three new commands).
This refactor centralises session state and port allocation in the Rust backend, eliminates duplicated logic, and resolves race conditions around model loading and session cleanup.
* Use String(e) for error
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
---------
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* feat: Add support for overriding tensor buffer type
This commit introduces a new configuration option, `override_tensor_buffer_t`, which allows users to specify a regex for matching tensor names to override their buffer type. This is an advanced setting primarily useful for optimizing the performance of large models, particularly Mixture of Experts (MoE) models.
By overriding the tensor buffer type, users can keep critical parts of the model, like the attention layers, on the GPU while offloading other parts, such as the expert feed-forward networks, to the CPU. This can lead to significant speed improvements for massive models.
Additionally, this change refines the error message to be more specific when a model fails to load. The previous message "Failed to load llama-server" has been updated to "Failed to load model" to be more accurate.
* chore: update FE to suppoer override-tensor
---------
Co-authored-by: Faisal Amir <urmauur@gmail.com>
- Complete beginner guide for running OpenAI's gpt-oss locally
- Step-by-step instructions using Jan AI
- Alternative installation methods (llama.cpp, Ollama, LM Studio)
- Performance benchmarks and troubleshooting guide
- SEO-optimized with FAQ section and comparison tables
- 4 supporting screenshots showing the installation process
* Improve Llama.cpp model path handling and validation
This commit refactors the load_llama_model function to improve how it handles and validates the model path.
Previously, the function extracted the model path but did not perform any validation. This change adds the following improvements:
It now checks for the presence of the -m flag.
It verifies that a path is provided after the -m flag.
It validates that the specified model path actually exists on the filesystem.
It ensures that the SessionInfo struct stores the canonical display path of the model, which is a more robust approach.
These changes make the model loading process more reliable and provide better error handling for invalid or missing model paths.
* Exp: Use short path on Windows
* Fix: Remove error channel and handling in llama.cpp server loading
The previous implementation used a channel to receive error messages from the llama.cpp server's stdout. However, this proved unreliable as the path names can contain 'errors strings' that we use to check even during normal operation. This commit removes the error channel and associated error handling logic.
The server readiness is still determined by checking for the "server is listening" message in stdout. Errors are now handled by relying on the process exit code and capturing the full stderr output if the process fails to start or exits unexpectedly. This approach provides a more robust and accurate error detection mechanism.
* Add else block in Windows path handling
* Add some path related tests
* Fix windows tests
* Improve Llama.cpp model path handling and validation
This commit refactors the load_llama_model function to improve how it handles and validates the model path.
Previously, the function extracted the model path but did not perform any validation. This change adds the following improvements:
It now checks for the presence of the -m flag.
It verifies that a path is provided after the -m flag.
It validates that the specified model path actually exists on the filesystem.
It ensures that the SessionInfo struct stores the canonical display path of the model, which is a more robust approach.
These changes make the model loading process more reliable and provide better error handling for invalid or missing model paths.
* Exp: Use short path on Windows
* Fix: Remove error channel and handling in llama.cpp server loading
The previous implementation used a channel to receive error messages from the llama.cpp server's stdout. However, this proved unreliable as the path names can contain 'errors strings' that we use to check even during normal operation. This commit removes the error channel and associated error handling logic.
The server readiness is still determined by checking for the "server is listening" message in stdout. Errors are now handled by relying on the process exit code and capturing the full stderr output if the process fails to start or exits unexpectedly. This approach provides a more robust and accurate error detection mechanism.
* Add else block in Windows path handling
* Add some path related tests
* Fix windows tests
* feat: Improve llama.cpp argument handling and add device parsing tests
This commit refactors how arguments are passed to llama.cpp,
specifically by only adding arguments when their values differ from
their defaults. This reduces the verbosity of the command and prevents
potential conflicts or errors when llama.cpp's default behavior aligns
with the desired setting.
Additionally, new tests have been added for parsing device output from
llama.cpp, ensuring the accurate extraction of GPU information (ID,
name, total memory, and free memory). This improves the robustness of
device detection.
The following changes were made:
* **Remove redundant `--ctx-size` argument:** The `--ctx-size`
argument is now only explicitly added if `cfg.ctx_size` is greater
than 0.
* **Conditional argument adding for default values:**
* `--split-mode` is only added if `cfg.split_mode` is not empty
and not 'layer'.
* `--main-gpu` is only added if `cfg.main_gpu` is not undefined
and not 0.
* `--cache-type-k` is only added if `cfg.cache_type_k` is not 'f16'.
* `--cache-type-v` is only added if `cfg.cache_type_v` is not 'f16'
(when `flash_attn` is enabled) or not 'f32' (otherwise). This
also corrects the `flash_attn` condition.
* `--defrag-thold` is only added if `cfg.defrag_thold` is not 0.1.
* `--rope-scaling` is only added if `cfg.rope_scaling` is not
'none'.
* `--rope-scale` is only added if `cfg.rope_scale` is not 1.
* `--rope-freq-base` is only added if `cfg.rope_freq_base` is not 0.
* `--rope-freq-scale` is only added if `cfg.rope_freq_scale` is
not 1.
* **Add `parse_device_output` tests:** Comprehensive unit tests were
added to `src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs`
to validate the parsing of llama.cpp device output under various
scenarios, including multiple devices, single devices, different
backends (CUDA, Vulkan, SYCL), complex GPU names, and error
conditions.
* fixup cache_type_v comparision
* Fix: Llama.cpp server hangs on model load
Resolves an issue where the llama.cpp server would hang indefinitely when loading certain models, as described in the attached ticket. The server's readiness message was not being correctly detected, causing the application to stall.
The previous implementation used a line-buffered reader (BufReader::lines()) to process the stderr stream. This method proved to be unreliable for the specific output of the llama.cpp server.
This commit refactors the stderr handling logic to use a more robust, chunk-based approach (read_until(b'\n', ...)). This ensures that the output is processed as it arrives, reliably capturing critical status messages and preventing the application from hanging during model initialization.
Fixes: #6021
* Handle error gracefully with ServerError
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* Revert "Handle error gracefully with ServerError"
This reverts commit 267a8a8a3262fbe36a445a30b8b3ba9a39697643.
* Revert "Fix: Llama.cpp server hangs on model load"
This reverts commit 44e5447f82f0ae32b6db7ffb213025f130d655c4.
* Add more guards, refactor and fix error sending to FE
---------
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
This commit introduces a significant restructuring of the documentation deployment and content strategy to support a gradual migration from Nextra to Astro.
- **New Astro Workflow (`jan-astro-docs.yml`)**: Implemented a new, separate GitHub Actions workflow to build and deploy the Astro site from the `/website` directory to a new subdomain (`v2.jan.ai`). This isolates the new site from the existing one, allowing for independent development and testing.
- **Removed Combined Workflow**: Deleted the previous, more complex combined workflow (`jan-combined-docs.yml`) and its associated test scripts to simplify the deployment process and eliminate routing conflicts.
- **Astro Config Update**: Simplified the Astro configuration (`astro.config.mjs`) by removing the conditional `base` path. The Astro site is now configured to deploy to the root of its own subdomain.
- **Mirrored Content**: Recreated the entire `/products` section from the Astro site within the Nextra site at `/docs/src/pages/products`. This provides content parity and a consistent user experience on both platforms during the transition period.
- **File Structure**: Established a clear, organized structure for platforms, models, and tools within the Nextra `products` directory.
- **Nextra Sidebar Fix**: Implemented the correct `_meta.json` structure for the new products section. Created nested meta files to build a collapsible sidebar, fixing the UI bug that caused duplicated navigation items.
- **"Coming Soon" Pages**: Added clear, concise "Coming Soon" and "In Development" banners and content for upcoming products like Jan V1, Mobile, Server, and native Tools, ensuring consistent messaging across both sites.
- **.gitignore**: Updated the root `.gitignore` to properly exclude build artifacts, caches, and environment files for both the Nextra (`/docs`) and Astro (`/website`) projects.
- **Repository Cleanup**: Removed temporary and unused files related to the previous combined deployment attempt.
This new architecture provides a stable, predictable, and low-risk path for migrating our documentation to Astro while ensuring the current production site remains unaffected.
* fix: generate a response button should appear when an incomplete tool call message is present
* fix: wording
* fix: do not send duplicate messages on regenerating
* fix: tests
* fix: remove CREATE_NEW_PROCESS_GROUP flag for proper Ctrl-C handling
CREATE_NEW_PROCESS_GROUP prevented GenerateConsoleCtrlEvent from working,
causing graceful shutdown failures. Removed to enable proper signal handling.
* Revert "fix: remove CREATE_NEW_PROCESS_GROUP flag for proper Ctrl-C handling"
This reverts commit 82ace3e72e4bf7338f422d5c79bdd6a0f8a2440e.
* fix: use direct process termination instead of console events
Simplified Windows process cleanup by removing console attachment logic
and using direct child.kill() method. More reliable for headless processes.
* Fix missing imports
* switch to tokio::time
* Don't wait while forcefully terminate process using kill API on Windows
Disabled use of windows-sys crate as graceful shutdown on Windows is unreliable in this context.
Updated cleanup.rs and server.rs to directly call child.kill().await for terminating processes on Windows.
Improved logging for process termination and error handling during kill and wait.
Removed timeout-based graceful shutdown attempt on Windows since TerminateProcess is inherently forceful and immediate.
This ensures more predictable process cleanup behavior on Windows platforms.
* final cleanups
- Updated .gitignore to remove yarn exclusion
- Added .vscode settings for Prettier formatting and Rust support
- Added .yarnrc.yml for workspace hoisting configuration
- Preserved all website directory work while syncing project improvements
This change improves the robustness of the llama.cpp extension's server port selection.
Previously, the `getRandomPort()` method only checked for ports already in use by active sessions, which could lead to model load failures if the chosen port was occupied by another external process.
This change introduces a new Tauri command, `is_port_available`, which performs a system-level check to ensure the randomly selected port is truly free before attempting to start the llama-server. It also adds a retry mechanism with a maximum number of attempts (20,000) to find an available port, throwing an error if no suitable port is found within the specified range after all attempts.
This enhancement prevents port conflicts and improves the reliability and user experience of the llama.cpp extension within Jan.
Closes#5965
- Added rich Products section with detailed platform coverage
- Enhanced all documentation sections with improved formatting
- Added new images and visual content throughout
- Reorganized Local Server docs into main docs flow
- Removed .vscode settings and added api-server-ui.png asset
* fix: assistant with last used and fix metadata
* chore: revert instruction and desc
* chore: fix current assistant state
* chore: updae metadata message assistant
* chore: update test case
Previously, the `autoUnload` flag was not being updated when set via config,
causing models to be auto-unloaded regardless of the intended behavior.
This patch ensures the setting is respected at runtime.
- Removed docs/src/pages/products/ directory and all contents
- Removed products entry from docs/src/pages/_meta.json
- Reverted docs/package.json, docs/bun.lock, docs/yarn.lock to dev state
This creates a cleaner separation between:
- Current branch: Astro site changes only
- rp/nextra-product-section branch: Nextra product section changes only
This commit introduces a significant restructuring of the documentation deployment and content strategy to support a gradual migration from Nextra to Astro.
- **New Astro Workflow (`jan-astro-docs.yml`)**: Implemented a new, separate GitHub Actions workflow to build and deploy the Astro site from the `/website` directory to a new subdomain (`v2.jan.ai`). This isolates the new site from the existing one, allowing for independent development and testing.
- **Removed Combined Workflow**: Deleted the previous, more complex combined workflow (`jan-combined-docs.yml`) and its associated test scripts to simplify the deployment process and eliminate routing conflicts.
- **Astro Config Update**: Simplified the Astro configuration (`astro.config.mjs`) by removing the conditional `base` path. The Astro site is now configured to deploy to the root of its own subdomain.
- **Mirrored Content**: Recreated the entire `/products` section from the Astro site within the Nextra site at `/docs/src/pages/products`. This provides content parity and a consistent user experience on both platforms during the transition period.
- **File Structure**: Established a clear, organized structure for platforms, models, and tools within the Nextra `products` directory.
- **Nextra Sidebar Fix**: Implemented the correct `_meta.json` structure for the new products section. Created nested meta files to build a collapsible sidebar, fixing the UI bug that caused duplicated navigation items.
- **"Coming Soon" Pages**: Added clear, concise "Coming Soon" and "In Development" banners and content for upcoming products like Jan V1, Mobile, Server, and native Tools, ensuring consistent messaging across both sites.
- **.gitignore**: Updated the root `.gitignore` to properly exclude build artifacts, caches, and environment files for both the Nextra (`/docs`) and Astro (`/website`) projects.
- **Repository Cleanup**: Removed temporary and unused files related to the previous combined deployment attempt.
This new architecture provides a stable, predictable, and low-risk path for migrating our documentation to Astro while ensuring the current production site remains unaffected.
This commit addresses a race condition where, with "Auto-Unload Old Models" enabled, rapidly attempting to load multiple models could result in more than one model being loaded simultaneously.
Previously, the unloading logic did not account for models that were still in the process of loading when a new load operation was initiated. This allowed new models to start loading before the previous ones had fully completed their unload cycle.
To resolve this:
- A `loadingModels` map has been introduced to track promises for models currently in the loading state.
- The `load` method now checks if a model is already being loaded and, if so, returns the existing promise, preventing duplicate load operations for the same model.
- The `performLoad` method (which encapsulates the actual loading logic) now ensures that when `autoUnload` is active, it waits for any *other* models that are concurrently loading to finish before proceeding to unload all currently loaded models. This guarantees that the auto-unload mechanism properly unloads all models, including those initiated in quick succession, thereby preventing the race condition.
This fixes the issue where clicking the start button very fast on multiple models would bypass the auto-unload functionality.
* fix: migrate app settings to the new version
* fix: edge cases
* fix: migrate HF import model on Windows
* fix hardware page broken after downgraded
* test: correct test
* fix: backward compatible hardware info
This commit addresses a potential race condition that could lead to "connection errors" when unloading a llamacpp model.
The issue arose because the `activeSessions` map still has the session info of the model during unload. This could lead to "connection errors" when the backend is taking time to unload while there is an ongoing request to the model.
The fix involves:
1. **Deleting the `pid` from `activeSessions` before calling backend's unload:** This ensures that the model is cleared from the map before we start unloading.
2. **Failure handling**: If somehow the backend fails to unload, the session info for that model is added back to prevent any race conditions.
This commit improves the robustness and reliability of the unloading process by preventing potential conflicts.
* fix: Allow N-GPU Layers (NGL) to be set to 0 in llama.cpp
The `n_gpu_layers` (NGL) setting in the llama.cpp extension was incorrectly preventing users from disabling GPU layers by automatically defaulting to 100 when set to 0.
This was caused by a condition that only pushed `cfg.n_gpu_layers` if it was greater than 0 (`cfg.n_gpu_layers > 0`).
This commit updates the condition to `cfg.n_gpu_layers >= 0`, allowing 0 to be a valid and accepted value for NGL. This ensures that users can effectively disable GPU offloading when desired.
* fix: default ngl
---------
Co-authored-by: Louis <louis@jan.ai>
The 'Auto-Unload Old Models' setting in the llama.cpp extension failed to persist due to a typo in its key name within `settings.json`. The key was incorrectly `auto_unload_models` instead of `auto_unload`.
This commit corrects the key name to `auto_unload`, ensuring that user-configured changes to this setting are properly saved, retrieved, and persist across application restarts.
This resolves the issue where the setting would change and remain to its previous value after being changed.
* feat: Enhance Llama.cpp backend management with persistence
This commit introduces significant improvements to how the Llama.cpp extension manages and updates its backend installations, focusing on user preference persistence and smarter auto-updates.
Key changes include:
* **Persistent Backend Type Preference:** The extension now stores the user's preferred backend type (e.g., `cuda`, `cpu`, `metal`) in `localStorage`. This ensures that even after updates or restarts, the system attempts to use the user's previously selected backend type, if available.
* **Intelligent Auto-Update:** The auto-update mechanism has been refined to prioritize updating to the **latest version of the *currently selected backend type*** rather than always defaulting to the "best available" backend (which might change). This respects user choice while keeping the chosen backend type up-to-date.
* **Improved Initial Installation/Configuration:** For fresh installations or cases where the `version_backend` setting is invalid, the system now intelligently determines and installs the best available backend, then persists its type.
* **Refined Old Backend Cleanup:** The `removeOldBackends` function has been renamed to `removeOldBackend` and modified to specifically clean up *older versions of the currently selected backend type*, preventing the accumulation of unnecessary files while preserving other backend types the user might switch to.
* **Robust Local Storage Handling:** New private methods (`getStoredBackendType`, `setStoredBackendType`, `clearStoredBackendType`) are introduced to safely interact with `localStorage`, including error handling for potential `localStorage` access issues.
* **Version Filtering Utility:** A new utility `findLatestVersionForBackend` helps in identifying the latest available version for a specific backend type from a list of supported backends.
These changes provide a more stable, user-friendly, and maintainable backend management experience for the Llama.cpp extension.
Fixes: #5883
* fix: cortex models migration should be done once
* feat: Optimize Llama.cpp backend preference storage and UI updates
This commit refines the Llama.cpp extension's backend management by:
* **Optimizing `localStorage` Writes:** The system now only writes the backend type preference to `localStorage` if the new value is different from the currently stored one. This reduces unnecessary `localStorage` operations.
* **Ensuring UI Consistency on Initial Setup:** When a fresh installation or an invalid backend configuration is detected, the UI settings are now explicitly updated to reflect the newly determined `effectiveBackendString`, ensuring the displayed setting matches the active configuration.
These changes improve performance by reducing redundant storage operations and enhance user experience by maintaining UI synchronization with the backend state.
* Revert "fix: provider settings should be refreshed on page load (#5887)"
This reverts commit ce6af62c7df4a7e7ea8c0896f307309d6bf38771.
* fix: add loader version backend llamacpp
* fix: wrong key name
* fix: model setting issues
* fix: virtual dom hub
* chore: cleanup
* chore: hide device ofload setting
---------
Co-authored-by: Louis <louis@jan.ai>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
* feat: add support for querying available backend devices
This change introduces a new `get_devices` method to the `llamacpp_extension` engine that allows the frontend to query and display a list of available devices (e.g., Vulkan, CUDA, SYCL) from the compiled `llama-server` binary.
* Added `DeviceList` interface to represent GPU/device metadata.
* Implemented `getDevices(): Promise<DeviceList[]>` method.
* Splits `version/backend`, ensures backend is ready.
* Invokes the new Tauri command `get_devices`.
* Introduced a new `get_devices` Tauri command.
* Parses `llama-server --list-devices` output to extract available devices with memory info.
* Introduced `DeviceInfo` struct (`id`, `name`, `mem`, `free`) and exposed it via serialization.
* Robust parsing logic using string processing (non-regex) to locate memory stats.
* Registered the new command in the `tauri::Builder` in `lib.rs`.
* Fixed logic to correctly parse multiple devices from the llama-server output.
* Handles common failure modes: binary not found, malformed memory info, etc.
This sets the foundation for device selection, memory-aware model loading, and improved diagnostics in Jan AI engine setup flows.
* Update extensions/llamacpp-extension/src/index.ts
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
---------
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* ci: update artifact name for Linux and Windows build
* ci: enhance logic for naming convention for mac, linux and windows builds
* fix: resolve nested template expression in artifact names
* Fix: Windows llamacpp not picking up dlls from lib repo
* Fix lib path on Windows
* Add debug info about lib_path
* Normalize lib_path for Windows
* fix window lib path normalization
* fix: missing cuda dll files on windows
* throw backend setup errors to UI
* Fix format
* Update extensions/llamacpp-extension/src/index.ts
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* feat: add logger to llamacpp-extension
* fix: platform check
---------
Co-authored-by: Louis <louis@jan.ai>
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* fix: support load model configurations
* chore: remove log
* chore: sampling params add from send completion
* chore: remove comment
* chore: remove comment on predefined file
* chore: update test model service
* refactor: Improve Llama.cpp backend management and auto-update
This commit refactors the Llama.cpp extension to enhance backend management and streamline the auto-update process.
Key changes include:
Refactored configureBackends: The logic for determining the best available backend and populating settings is now more modular, preventing duplicate executions.
Dedicated Auto-update Handling: Introduced a handleAutoUpdate method to encapsulate the auto-update logic, including downloading the latest available backend and updating the internal configuration and settings.
Robust Old Backend Cleanup: The removeOldBackends method is improved to ensure only the currently used backend version and type are kept, effectively managing disk space. A delay is added for Windows to prevent file conflicts during cleanup.
Final Installation Check: A ensureFinalBackendInstallation method is added to guarantee the selected backend is installed, acting as a final safeguard after auto-update or if auto-update is disabled.
Minor Fixes:
Added console.log for save_path during decompression for better debugging.
Ensured the output directory exists before decompression in the Rust backend.
Removed extraneous console log for session info.
Updated Cargo.toml and tauri.conf.json versions.
These changes lead to a more reliable and efficient Llama.cpp backend experience within the application, particularly for users with auto-update enabled.
* fix isBackendInstalled parameters
* Address bot's comments
* Address bot comments of using try finally block
On Windows, spawning the llamacpp server was causing an unwanted terminal window
to appear. This is now fixed by combining `CREATE_NO_WINDOW` with
`CREATE_NEW_PROCESS_GROUP` using `.creation_flags(...)`, ensuring that the
process runs in the background without a console window.
This change only applies to 64-bit Windows builds.
* feat: support per-model overrides in llama.cpp load()
Extend the `load()` method in the llama.cpp extension to accept optional
`overrideSettings`, allowing fine-grained per-model configuration.
This enables users to override provider-level settings such as `ctx_size`,
`chat_template`, `n_gpu_layers`, etc., when loading a specific model.
Fixes: #5818 (Feature Request - Jan v0.6.6)
Use cases enabled:
- Different context sizes per model (e.g., 4K vs 32K)
- Model-specific chat templates (ChatML, Alpaca, etc.)
- Performance tuning (threads, GPU layers)
- Better memory management per deployment
Maintains full backward compatibility with existing provider config.
* swap overrideSettings and isEmbedding argument
* fix: Enhance stream error handling and parsing
This commit improves the robustness of stream processing in the llamacpp-extension.
- Adds explicit handling for 'error:' prefixed lines in the stream, parsing the contained JSON error and throwing an appropriate JavaScript Error.
- Centralizes JSON parsing of 'data:' and 'error:' lines, ensuring consistent error propagation by re-throwing parsing exceptions.
- Ensures the async iterator terminates correctly upon encountering stream errors or malformed JSON.
* Address bot comments and cleanup
* feat: add autoqa
* chore: add auto start computer_server
* chore: add ci autoqa windows
* chore: add ci support for both windows and linux
* chore: add ci support for macos
* chore: refactor auto qa
* chore: refactor autoqa workflow
* chore: fix upload turn
* refactor: move thinking toggle to runtime settings for per-message control
Replaces the static `reasoning_budget` config with a dynamic `enable_thinking` flag under `chat_template_kwargs`, allowing models like Jan-nano and Qwen3 to enable/disable thinking behavior at runtime, even mid-conversation.
Requires UI update
* remove engine argument
* fix: Prevent spamming /health endpoint and improve startup and resolve compiler warnings
This commit introduces a delay and improved logic to the /health endpoint checks in the llamacpp extension, preventing excessive requests during model loading.
Additionally, it addresses several Rust compiler warnings by:
- Commenting out an unused `handle_app_quit` function in `src/core/mcp.rs`.
- Explicitly declaring `target_port`, `session_api_key`, and `buffered_body` as mutable in `src/core/server.rs`.
- Commenting out unused `tokio` imports in `src/core/setup.rs`.
- Enhancing the `load_llama_model` function in `src/core/utils/extensions/inference_llamacpp_extension/server.rs` to better monitor stdout/stderr for readiness and errors, and handle timeouts.
- Commenting out an unused `std::path::Prefix` import and adjusting `normalize_path` in `src/core/utils/mod.rs`.
- Updating the application version to 0.6.904 in `tauri.conf.json`.
* fix grammar!
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* fix grammar 2
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* reimport prefix but only on Windows
* remove instead of commenting
* remove redundant check
* sync app version in cargo.toml with tauri.conf
---------
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* feat: Improve llamacpp server error reporting and model load stability
This commit introduces significant improvements to how the llamacpp server
process is managed and how its errors are reported.
Key changes:
- **Enhanced Error Reporting:** The llamacpp server's stdout and stderr
are now piped and captured. If the llamacpp process exits prematurely
or fails to start, its stderr output is captured and returned as a
`LlamacppError`. This provides much more specific and actionable
diagnostic information for users and developers.
- **Increased Model Load Timeout:** The `waitForModelLoad` timeout has
been increased from 30 seconds to 240 seconds (4 minutes). This
addresses issues where larger models or slower systems would
prematurely time out during the model loading phase.
- **API Secret Update:** The internal API secret for the llamacpp
extension has been updated from 'Jan' to 'JustAskNow'.
- **Version Bump:** The application version in `tauri.conf.json` has
been incremented to `0.6.901`.
* fix: should not spam load requests
* test: add test to cover the fix
* refactor: clean up
* test: add more test case
---------
Co-authored-by: Louis <louis@jan.ai>
- pulls fix for #5463 out of the github release workflow and into
the make/yarn build process
- implements a wrapper script that pins linuxdeploy and injects
a new location for XDG_CACHE_HOME into the build pipeline,
allowing manipulating .cache/tauri without tainting the hosts
.cache
- adds ./.cache (project_root/.cache) to make clean and mise clean
task
- remove .devcontainer/buildAppImage.sh, obsolete now that extra
build steps have been removed from the github workflow and
incorporated in the normal build process
- remove appimagetool from .devcontainer/postCreateCommand.sh,
as it was only used by .devcontainer/buildAppImage.sh
- pulled appimage packaging steps out of release workflow into new
src-tauri/build-utils/buildAppImage.sh
- cleaned up yarn scripts:
- moved multi platform yarn scripts out of yarn build:tauri:<platform>
into generic yarn build:tauri
- split yarn build:tauri:linux:win32 into separate yarn scripts so it's
clearer what is specific to which platform
- added src-tauri/build-utils/buildAppImage.sh to new yarn build:tauri:linux
yarn script
This is also a good entry point to add flatpak builds in the future.
Part of #5641
Allows for better per platform default config. Currently the
default serves windows/macos fine while it has to be tweaked
in order to build for linux
make build-tauri now successfully runs where it errored out before.
Appimages made with make alone however is incomplete as there are
still post processing steps in the github release workflow to bundle
additional resources.
- split platform specific config out of tauri.conf.json into auxiliary
platform specific config files, natively supported by tauri
- pull improved defaults out of template-tauri-build-linux-x64.yml
into new tauri.linux.conf.json
- fix tauri-build-linx-x64.yml to utilize new tauri.linux.conf.json
Things to ponder:
- Now, the v1/models endpoint of the API server will return an empty
list if no models are loaded
- Streaming v1/chat/completion routing works as well as v1/models; needs
further testing
- Changed `pid` field in `SessionInfo` from `string` to `number`/`i32` in TypeScript and Rust.
- Updated `activeSessions` map key from `string` to `number` to align with new PID type.
- Adjusted process monitoring logic to correctly handle numeric PIDs.
- Removed fallback UUID-based PID generation in favor of numeric fallback (-1).
- Added PID cleanup logic in `is_process_running` when the process is no longer alive.
- Bumped application version from 0.5.16 to 0.6.900 in `tauri.conf.json`.
Adds a new configuration option `chat_template` to the Llama.cpp extension, allowing users to define a custom Jinja chat template for the model.
The template can be provided via a new input field in the settings, and if set, it will be passed to the Llama.cpp backend using the `--chat-template` argument. This enhances flexibility for users who require specific chat formatting beyond the GGUF default.
The `chat_template` is added to the `LlamacppConfig` type and conditionally pushed to the command arguments if it's provided. The placeholder text provides an example of a Jinja template structure.
The current implementation of Ctrl-C handling was not properly tested on Windows x86_64 architectures. To address this, the code has been modified to use `i32` instead of `BOOL` to handle the result of the `GenerateConsoleCtrlEvent` function, ensuring that the return value is correctly checked across different platforms.
This change updates the dependencies of the Cargo.toml file on Windows to include additional features from the `windows-sys` crate. The `CreateProcess flags like CREATE_NEW_PROCESS_GROUP` feature is now enabled to allow for proper process management.
The code now properly sends Ctrl+C to the llama process on Windows, and also includes error handling for when the Ctrl+C command fails. Additionally, it now uses the `Windows` API to kill the process when it times out, and properly handles the wait for the process to exit.
Updated Rust code to apply Windows-specific logic only on x86_64 targets using #[cfg(all(windows, target_arch = "x86_64"))]. Modified dev:tauri script in package.json to remove CLEAN=true and added CLEAN=true to beforeDevCommand in tauri.conf.json for consistency. Minor formatting changes in tauri.conf.json.
This commit introduces embedding functionality to the llamacpp extension. It allows users to generate embeddings for text inputs using the 'sentence-transformer-mini' model. The changes include:
- Adding a new `embed` method to the `llamacpp_extension` class.
- Implementing model loading and API interaction for embeddings.
- Handling potential errors during API requests.
- Adding necessary types for embedding responses and data.
- The load method now accepts a boolean parameter to determine if it should load embedding model.
The changes include:
- Renaming interfaces (sessionInfo -> SessionInfo, unloadResult -> UnloadResult) for consistency
- Adding getLoadedModels() method to retrieve active model IDs
- Updating variable names from modelId to model_id for alignment
- Updating cleanup paths to use XDG-standard locations
- Improving type consistency across extension implementation
Rename variable, struct, and enum names from camelCase to snake_case throughout the llamacpp extension codebase to align with Rust naming conventions. This change improves readability and consistency without altering functionality.
Add comprehensive sampling parameters for fine-grained control over AI output generation, including dynamic temperature, Mirostat sampling, repetition penalties, and advanced prompt handling. These parameters enable more precise tuning of model behavior and output quality.
Change the llama_server_process state from an Option<Child> to a HashMap<String, Child> to support managing multiple server instances by PID. This allows precise process tracking and termination, replacing the previous single-process limitation.
Previously, only one server process could be tracked at a time. Now, each process is stored with its PID as the key, enabling:
- Accurate session matching during unloading
- Proper termination of specific processes
- Better error handling for mismatched PIDs
The load_llama_model function now inserts processes into the map, and unload_llama_model removes them by PID.
The loop now extracts session info to retrieve the model ID, ensuring correct unloading of sessions by their associated model identifiers rather than session IDs. This aligns the cleanup process with the actual model resources being managed.
The generateApiKey method now incorporates the model's port to create a unique,
port-specific API key, enhancing security by ensuring keys are tied to both
model ID and port. This change supports better isolation between models
running on different ports. Code formatting improvements were also made
for consistency and readability.
The changes standardize identifier names across the codebase for clarity:
- Replaced `sessionId` with `pid` to reflect process ID usage
- Changed `modelName` to `modelId` for consistency with identifier naming
- Renamed `api_key` to `apiKey` for camelCase consistency
- Updated corresponding methods to use these new identifiers
- Improved type safety and readability by aligning variable names with their semantic meaning
This change allows the port to be specified via command line arguments, providing flexibility. The port is parsed from the arguments, defaulting to 8080 if not provided.
The changes improve the robustness of command-line argument parsing in the Llama model server by replacing direct index access with safe iteration methods. A new generate_api_key function was added to handle API key generation securely. The sessionId parameter was standardized to match the renamed property in the client code.
- Changed load method to accept modelId instead of loadOptions for better clarity and simplicity
- Renamed engineBasePath parameter to backendPath for consistency with the backend's directory structure
- Added getRandomPort method to ensure unique ports for each session to prevent conflicts
- Refactored configuration and model loading logic to improve maintainability and reduce redundancy
The `ImportOptions` interface was updated to include `modelPath` and `mmprojPath`. These options are required for importing models and multi-modal projects.
The `loadOptions` interface in `AIEngine.ts` now includes an optional `mmprojPath` property. This allows users to provide a path to their MMProject file when loading a model, which is required for certain model types. The `llamacpp-extension/src/index.ts` has been updated to pass this option to the llamacpp server if provided.
* wip
* update
* add download logic
* add decompress. support delete file
* download backend upon selecting setting
* add some logging and nootes
* add note on race condition
* remove then catch
* default to none backend. only download if it's not installed
* merge version and backend. fetch version from GH
* restrict scope of output_dir
* add note on unpack
This commit introduces API key generation for the Llama.cpp extension. The API key is now generated on the server side using HMAC-SHA256 and a secret key to ensure security and uniqueness. The frontend now passes the model ID and API secret to the server to generate the key. This addresses the requirement for secure model access and authorization.
* add pull and abortPull
* add model import (download only)
* write model.yaml. support local model import
* remove cortex-related command
* add TODO
* remove cortex-related command
* feat: improve local provider connectivity with CORS bypass
- Add @tauri-apps/plugin-http dependency
- Implement dual fetch strategy for local vs remote providers
- Auto-detect local providers (localhost, Ollama:11434, LM Studio:1234)
- Make API key optional for local providers
- Add comprehensive test coverage for provider fetching
refactor: simplify fetchModelsFromProvider by removing preflight check logic
* feat: extend config options to include custom fetch function for CORS handling
* feat: conditionally use Tauri's fetch for openai-compatible providers to handle CORS
* Feat: auto restart mcp (#5226)
* feat: implement retry mechanism for MCP server activation with exponential backoff
feat: enhance MCP server activation with configurable retry attempts
feat: implement MCP server restart monitoring and cleanup functionality
feat: enhance MCP server restart logic with improved monitoring and configuration handling
feat: add manual deactivation for MCP servers to prevent automatic restarts
* feat: enhance MCP server startup with initial attempt tracking and health monitoring
* 🐛fix: prevent render error when additional information missing from hardware (#5413)
---------
Co-authored-by: Sam Hoang Van <samhv.ict@gmail.com>
Co-authored-by: Faisal Amir <urmauur@gmail.com>
* feat: implement retry mechanism for MCP server activation with exponential backoff
feat: enhance MCP server activation with configurable retry attempts
feat: implement MCP server restart monitoring and cleanup functionality
feat: enhance MCP server restart logic with improved monitoring and configuration handling
feat: add manual deactivation for MCP servers to prevent automatic restarts
* feat: enhance MCP server startup with initial attempt tracking and health monitoring
* feat: implement retry mechanism for MCP server activation with exponential backoff
feat: enhance MCP server activation with configurable retry attempts
feat: implement MCP server restart monitoring and cleanup functionality
feat: enhance MCP server restart logic with improved monitoring and configuration handling
feat: add manual deactivation for MCP servers to prevent automatic restarts
* feat: enhance MCP server startup with initial attempt tracking and health monitoring
* feat: add initial mise.toml configuration with core tasks and development setup
* docs: update Quick Start section to recommend mise for development setup
* Refactor translation imports and update text for localization across settings and system monitor routes
- Changed translation import from 'react-i18next' to '@/i18n/react-i18next-compat' in multiple files.
- Updated various text strings to use translation keys for better localization support in:
- Local API Server settings
- MCP Servers settings
- Privacy settings
- Provider settings
- Shortcuts settings
- System Monitor
- Thread details
- Ensured consistent use of translation keys for all user-facing text.
Update web-app/src/routes/settings/appearance.tsx
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Update web-app/src/routes/settings/appearance.tsx
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Update web-app/src/locales/vn/settings.json
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Update web-app/src/containers/dialogs/DeleteMCPServerConfirm.tsx
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Update web-app/src/locales/id/common.json
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* Add Chinese (Simplified and Traditional) localization files for various components
- Created `tools.json`, `updater.json`, `assistants.json`, `chat.json`, `common.json`, `hub.json`, `logs.json`, `mcp-servers.json`, `provider.json`, `providers.json`, `settings.json`, `setup.json`, `system-monitor.json`, `tool-approval.json` in both `zh-CN` and `zh-TW` locales.
- Added translations for tool approval, updater notifications, assistant management, chat interface, common UI elements, hub interactions, logging messages, MCP server configurations, provider management, settings options, setup instructions, and system monitoring.
* Refactor localization strings for improved clarity and consistency in English, Indonesian, and Vietnamese settings files
* Fix missing key and reword
* fix pr comment
---------
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* chore: enable shortcut zoom (#5261)
* chore: enable shortcut zoom
* chore: update shortcut setting
* fix: thinking block (#5263)
* Merge pull request #5262 from menloresearch/chore/sync-new-hub-data
chore: sync new hub data
* ✨enhancement: model run improvement (#5268)
* fix: mcp tool error handling
* fix: error message
* fix: trigger download from recommend model
* fix: can't scroll hub
* fix: show progress
* ✨enhancement: prompt users to increase context size
* ✨enhancement: rearrange action buttons for a better UX
* 🔧chore: clean up logics
---------
Co-authored-by: Faisal Amir <urmauur@gmail.com>
* fix: glitch download from onboarding (#5269)
* ✨enhancement: Model sources should not be hard coded from frontend (#5270)
* 🐛fix: default onboarding model should use recommended quantizations (#5273)
* 🐛fix: default onboarding model should use recommended quantizations
* ✨enhancement: show context shift option in provider settings
* 🔧chore: wording
* 🔧 config: add to gitignore
* 🐛fix: Jan-nano repo name changed (#5274)
* 🚧 wip: disable showSpeedToken in ChatInput
* 🐛 fix: commented out the wrong import
* fix: masking value MCP env field (#5276)
* ✨ feat: add token speed to each message that persist
* ♻️ refactor: to follow prettier convention
* 🐛 fix: exclude deleted field
* 🧹 clean: all the missed console.log
* ✨enhancement: out of context troubleshooting (#5275)
* ✨enhancement: out of context troubleshooting
* 🔧refactor: clean up
* ✨enhancement: add setting chat width container (#5289)
* ✨enhancement: add setting conversation width
* ✨enahncement: cleanup log and change improve accesibility
* ✨enahcement: move const beta version
* 🐛fix: optional additional_information gpu (#5291)
* 🐛fix: showing release notes for beta and prod (#5292)
* 🐛fix: showing release notes for beta and prod
* ♻️refactor: make an utils env
* ♻️refactor: hide MCP for production
* ♻️refactor: simplify the boolean expression fetch release note
* 🐛fix: typo in build type check (#5297)
* 🐛fix: remove onboarding local model and hide the edit capabilities model (#5301)
* 🐛fix: remove onboarding local model and hide the edit capabilities model
* ♻️refactor: conditional search params setup screen
* 🐛fix: hide token speed when assistant params stream false (#5302)
* 🐛fix: glitch padding speed token (#5307)
* 🐛fix: immediately show download progress (#5308)
* 🐛fix:safely convert values to numbers and handle NaN cases (#5309)
* chore: correct binary name for stable version (#5303) (#5311)
Co-authored-by: hiento09 <136591877+hiento09@users.noreply.github.com>
* 🐛fix: llama.cpp default NGL setting does not offload all layers to GPU (#5310)
* 🐛fix: llama.cpp default NGL setting does not offload all layers to GPU
* chore: cover more cases
* chore: clean up
* fix: should not show GPU section on Mac
* 🐛fix: update default extension settings (#5315)
* fix: update default extension settings
* chore: hide language setting on Prod
* 🐛fix: allow script posthog (#5316)
* Sync 0.5.18 to 0.6.0 (#5320)
* chore: correct binary name for stable version (#5303)
* ci: enable devtool on prod build (#5317)
* ci: enable devtool on prod build
---------
Co-authored-by: hiento09 <136591877+hiento09@users.noreply.github.com>
Co-authored-by: Nguyen Ngoc Minh <91668012+Minh141120@users.noreply.github.com>
* fix: glitch model download issue (#5322)
* 🐛 fix(updater): terminate sidecar processes before update to avoid file access errors (#5325)
* 🐛 fix: disable sorting for threads in SortableItem and clean up thread order handling (#5326)
* improved wording in UI elements (#5323)
* fix: sorted-thread-not-stable (#5336)
* 🐛fix: update wording desc vulkan (#5338)
* 🐛fix: update wording desc vulkan
* ✨enhancement: update copy
* 🐛fix: handle NaN value tokenspeed (#5339)
* 🐛 fix: window path problem
* feat(server): filter /models endpoint to show only downloaded models (#5343)
- Add filtering logic to proxy server for GET /models requests
- Keep only models with status "downloaded" in response
- Remove Content-Length header to prevent mismatch after filtering
- Support both ListModelsResponseDto and direct array formats
- Add comprehensive tests for filtering functionality
- Fix Content-Length header conflict causing empty responses
Fixes issue where all models were returned regardless of download status.
* 🐛fix: render streaming token speed based on thread ID & assistant metadata (#5346)
* fix(server): add gzip decompression support for /models endpoint filtering (#5349)
- Add gzip detection using magic number check (0x1f 0x8b)
- Implement gzip decompression before JSON parsing
- Add gzip re-compression for filtered responses
- Fix "invalid utf-8 sequence" error when upstream returns gzipped content
- Maintain Content-Encoding consistency for compressed responses
- Add comprehensive gzip handling with flate2 library
Resolves issue where filtering failed on gzip-compressed model responses.
* fix(proxy): implement true HTTP streaming for chat completions API (#5350)
* fix: glitch toggle gpus (#5353)
* fix: glitch toogle gpu
* fix: Using the GPU's array index as a key for gpuLoading
* enhancement: added try-finally
* fix: built in models capabilities (#5354)
* 🐛fix: setting provider hide model capabilities (#5355)
* 🐛fix: setting provider hide model capabilities
* 🐛fix: hide tools icon on dropdown model providers
* fix: stop server on app close or reload
* ✨enhancement: reset heading class
---------
Co-authored-by: Louis <louis@jan.ai>
* fix: stop api server on page unload (#5356)
* fix: stop api server on page unload
* fix: check api server status on reload
* refactor: api server state
* fix: should not pop the guard
* 🐛fix: avoid render html title thread (#5375)
* 🐛fix: avoid render html title thread
* chore: minor bump - tokenjs for manual adding models
---------
Co-authored-by: Louis <louis@jan.ai>
---------
Co-authored-by: Faisal Amir <urmauur@gmail.com>
Co-authored-by: LazyYuuki <huy2840@gmail.com>
Co-authored-by: Bui Quang Huy <34532913+LazyYuuki@users.noreply.github.com>
Co-authored-by: hiento09 <136591877+hiento09@users.noreply.github.com>
Co-authored-by: Nguyen Ngoc Minh <91668012+Minh141120@users.noreply.github.com>
Co-authored-by: Sam Hoang Van <samhv.ict@gmail.com>
Co-authored-by: Ramon Perez <ramonpzg@protonmail.com>
* 🐛fix: setting provider hide model capabilities
* 🐛fix: hide tools icon on dropdown model providers
* fix: stop server on app close or reload
* ✨enhancement: reset heading class
---------
Co-authored-by: Louis <louis@jan.ai>
- Add filtering logic to proxy server for GET /models requests
- Keep only models with status "downloaded" in response
- Remove Content-Length header to prevent mismatch after filtering
- Support both ListModelsResponseDto and direct array formats
- Add comprehensive tests for filtering functionality
- Fix Content-Length header conflict causing empty responses
Fixes issue where all models were returned regardless of download status.
* 🐛fix: llama.cpp default NGL setting does not offload all layers to GPU
* chore: cover more cases
* chore: clean up
* fix: should not show GPU section on Mac
* 🐛fix: showing release notes for beta and prod
* ♻️refactor: make an utils env
* ♻️refactor: hide MCP for production
* ♻️refactor: simplify the boolean expression fetch release note
* chore: simple onboarding local model
* chore: update new model and improve flow e2e onboarding local model
* fix: default tool support models
---------
Co-authored-by: Louis <louis@jan.ai>
* fix(server): enhance CORS handling for local API network access
- Fix CORS preflight validation to use Host header for target validation
- Use Origin header correctly for CORS response headers
- Improve host validation to support both host:port and host-only formats
- Filter upstream CORS headers to prevent duplicate Access-Control-Allow-Origin
- Add CORS headers to all error responses for consistent behavior
- Fix host matching logic to handle trusted hosts with and without ports
- Ensure single Access-Control-Allow-Origin header per response
This resolves CORS preflight failures that were blocking cross-origin
requests to the local API server, enabling proper network access from
web applications and external tools.
Fixes: OPTIONS requests being rejected due to incorrect host validation
Resolves: "access control allow origin cannot contain more than one origin" error
* fix(proxy): bypass host and authorization checks for root path in CORS preflight
* fix(proxy): bypass host and authorization checks for whitelisted paths
* feat: integrate fuzzy search into model dropdown
- Replace DropdownMenu with Popover for better search UX
- Include search input with clear functionality
- Reorganize layout with capabilities at end of row
- Maintain provider grouping and model selection functionality
Improves model discovery and selection with instant search across
model names, providers, and capabilities.
* chore: enhance input search style
* feat: enhance model dropdown with search highlighting and fixed positioning
- Add FZF search highlighting with text-accent color for matched characters
- Fix dropdown to only appear below (prevent upward positioning)
- Import highlightFzfMatch utility for search result highlighting
- Update SearchableModel interface to include highlightedId property
- Modify FZF selector to target model.id for more accurate highlighting
- Use dangerouslySetInnerHTML to render highlighted search matches
- Add avoidCollisions=false to PopoverContent for consistent positioning
---------
Co-authored-by: Faisal Amir <urmauur@gmail.com>
* feat: Jan API Server should have API Key setting
* chore: reveal api key icon
* Update src-tauri/src/core/server.rs
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* chore: add validation apiKey
---------
Co-authored-by: Faisal Amir <urmauur@gmail.com>
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* chore: hide some model capabilities
* chore: update model setting description
* chore: disable vision and embedding and add tooltip
---------
Co-authored-by: Louis <louis@jan.ai>
* fix: relocate jan data folder failed
* fix: avoid infinite recursion
* chore: kill background processes to unblock factory reset
* chore: stop models before reset factory
* chore: clean up
* chore: clean up
* fix: show error
* chore: get active models should not have retry
* chore: remove legacy themes
* refactor: clean up dependencies
* chore: remove cuda 11 dependency - fix linux LD_LIBRARY_PATH
* fix: load models issue on Linux
# Conflicts:
# src-tauri/src/core/setup.rs
* chore: do not download cuda 11 by default
* chore: remove cuda 11 from installer
* fix: cuda lookup on Linux
* refactor: deprecate legacy packages and clean up build scripts
* chore: remove joi publish workflow
* chore: core publish run on dispatch only
* chore: correct version bump on web package
* chore: make dev for tauri target
* enhancement: ux change data folder with confirmation and reveal in finder logs
* chore: update button open logs local api server
* Update web-app/src/components/ui/button.tsx
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* chore: handle error when change location data folder failed
---------
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* feat: migrate legacy local storage data to new app
* chore: refactor localstorage db read
* chore: clean up
* chore: migrate api key setting
* chore: apply proxy configs
* chore: fix key
* chore: Jan's code is now under the Apache license (#5042)
* make the model selector popup responsive and wider for bigger screens (#5025)
* make the model selector popup responsive and wider for bigger screens
* fix linting issue
* use vscode config files to recommend prettier plugin and use it to auto format on save
---------
Co-authored-by: Ethan Garber <ethancgarber@gmail.com>
* chore: update Jan change logs v0.5.17
chore: update Jan change logs v0.5.17
* Update README.md (#5072)
Updated license to Apache 2.0
---------
Co-authored-by: ethanova <ethanova@users.noreply.github.com>
Co-authored-by: Ethan Garber <ethancgarber@gmail.com>
Co-authored-by: David <david@menlo.ai>
Co-authored-by: Emre Can Kartal <159995642+eckartal@users.noreply.github.com>
* chore: allow users to input HF repo id / url
* chore: allow users to search HuggingFace models
* chore: normalize input
* chore: normalize input from FE
* chore: clean up
* chore: clean up
* fix: conflict
* fix: model name from metada instead id
* chore: enable ryhype raw for desc card hub
* fix: broken link
* chore: remove log
---------
Co-authored-by: Faisal Amir <urmauur@gmail.com>
* chore: enhance tool use loop
* fix: create new custom provider is not saved
* chore: bump llama.cpp b5488
* chore: normalize reasoning assistant response
* chore: fix tool call parse in stream mode
* fix: give tool call default generated id
* fix: system instruction should be on top of the history
* chore: allow users to add parameters
* feat: Implement Cortex server auto-restart and webview notification
Implements a robust auto-restart mechanism for the Cortex server (sidecar)
managed by the Tauri backend.
Key changes:
Backend (src-tauri):
- Modified `core/setup.rs` to:
- Loop sidecar spawning, attempting up to `MAX_RESTARTS` (5) times with a
`RESTART_DELAY_MS` (5 seconds) between attempts.
- Monitor the sidecar process for unexpected termination (crashes or
non-zero exit codes).
- Reset the restart attempt count to 0 in `AppState` upon a successful
server spawn.
- Emit a "cortex_max_restarts_reached" event to the webview if the
server fails to start after `MAX_RESTARTS`.
- Updated `core/state.rs` to include `cortex_restart_count: Arc<Mutex<u32>>`
in `AppState` to track restart attempts.
- Added a new Tauri command `reset_cortex_restart_count` in `core/cmd.rs`
to allow the webview (or other parts of the app) to reset this counter.
- Registered the new command and initialized the `cortex_restart_count`
in `lib.rs`.
Frontend (web-app):
- Created a new component `CortexFailureDialog.tsx` in
`src/containers/dialogs/` to:
- Listen for the "cortex_max_restarts_reached" event from Tauri.
- Display a dialog informing the user that the local AI engine (Cortex)
failed to start after multiple attempts.
- Offer options to "Contact Support" (opens jan.ai/support),
"Restart Jan" (invokes the `relaunch` Tauri command), or "Okay"
(dismisses the dialog).
- Integrated the `CortexFailureDialog` into the `RootLayout` in
`src/routes/__root.tsx` so it's globally available.
- Corrected button variants in `__root.tsx` to use `variant="default"`
with appropriate classNames for outline styling, resolving TypeScript
errors.
* refactor: Improve async handling and logging in setup_sidecar function
This commit replaces Fuse.js with Fzf for client-side thread searching
within the web application, offering potentially improved performance and
a different fuzzy matching algorithm.
Key changes include:
- Updated `package.json` to remove `fuse.js` and add `fzf`.
- Refactored `useThreads.ts` hook:
- Replaced Fuse.js instantiation and search logic with Fzf.
- Integrated a new `highlightFzfMatch` utility to return thread
titles with HTML highlighting for matched characters.
- Created `utils/highlight.ts` for the `highlightFzfMatch` function.
- Updated `ThreadList.tsx`:
- Renders highlighted thread titles using `dangerouslySetInnerHTML`.
- Ensures the rename functionality uses and edits a plain text
version of the title, stripping any highlight tags.
- Updated `index.css`:
- Modified the `.search-highlight` class to use `font-bold` and
`color-mix(in srgb, currentColor 80%, white 20%)` for a
subtly brighter text effect on highlighted matches, replacing
previous styling.
This provides a more robust search experience with clear visual feedback
for matched terms in the thread list.
Modified the `onDragEnd` handler in `ThreadList.tsx` to use the complete global thread list from the `useThreads` store. This resolves a bug where dragging and dropping threads could lead to items disappearing from "Favorites" or "Recents" sections by ensuring the reordering logic operates on the full dataset.
Additionally, the default `order: 1` initialization for new threads in `useThreads.ts` and `services/threads.ts` has been commented out this make thread sorted by newest to oldest
* make the model selector popup responsive and wider for bigger screens
* fix linting issue
* use vscode config files to recommend prettier plugin and use it to auto format on save
---------
Co-authored-by: Ethan Garber <ethancgarber@gmail.com>
* fix: should not spawn many llama.cpp servers for the same model
* chore: test step placeholder for the new revamp
* chore: coverage check should not fail pipeline
* feat: run mcp with bundled bun and uv
* chore: clean up
* chore: pull binaries windows, linux (#4963)
* fix: get bun and uv from execution path
* fix: macos
* fix: typo
---------
Co-authored-by: vansangpfiev <vansangpfiev@gmail.com>
* chore: bump cortex 1.0.11-rc10
* chore: bump to latest cortex release
* feat: Cortex API Authorization
* chore: correct CI CD repo name
* chore: correct new menloresearch repo name
* feat: rotate api token for each run (#4820)
* feat: rotate api token for each run
* chore: correct github repo url
* chore: correct github api url
* chore: should not filter out models first launch
* chore: bump cortex release
* chore: should get hardware information on launch (#4821)
* chore: should have an option to not revalidate hardware information
* chore: cortex.cpp gpu activation could cause a race condition (#4825)
* fix: jan beta logo displayed in jan release (#4828)
---------
Co-authored-by: David <davidpt.janai@gmail.com>
Co-authored-by: Nguyen Ngoc Minh <91668012+Minh141120@users.noreply.github.com>
* chore: migrate engine settings on update
* chore: queue engine migration to ensure it only execute when server is on
* chore: ensure queue is empty instead of running in the queue
* refactor: different Jan instances should have different Cortex server port configurations
* chore: update workflow to use env input
* chore: update env for cortex port setting
* docs: add DeepSeek R1 local installation guide
- Add comprehensive guide for running DeepSeek R1 locally
- Include step-by-step instructions with screenshots
- Add VRAM requirements and model selection guide
- Include system prompt setup instructions
* docs: add comprehensive guide on running AI models locally
* docs: address PR feedback for DeepSeek R1 and local AI guides
- Improve language and terminology throughout
- Add Linux support information
- Enhance technical explanations
- Update introduction for better flow
- Fix parameters section in run-ai-models-locally.mdx
* docs: improve local AI guides content and linking
- Update titles and introductions for better SEO
- Add opinionated guidance section for beginners
- Link DeepSeek guide with general local AI guide
- Fix typos and improve readability
* docs: add new changelog posts and images
- Add DeepSeek R1 changelog (v0.5.14)
- Add key issues resolved changelog (v0.5.13)
- Add corresponding changelog images
---------
Co-authored-by: Louis <louis@jan.ai>
* docs: add DeepSeek R1 local installation guide
- Add comprehensive guide for running DeepSeek R1 locally
- Include step-by-step instructions with screenshots
- Add VRAM requirements and model selection guide
- Include system prompt setup instructions
* docs: add comprehensive guide on running AI models locally
* docs: address PR feedback for DeepSeek R1 and local AI guides
- Improve language and terminology throughout
- Add Linux support information
- Enhance technical explanations
- Update introduction for better flow
- Fix parameters section in run-ai-models-locally.mdx
* docs: improve local AI guides content and linking
- Update titles and introductions for better SEO
- Add opinionated guidance section for beginners
- Link DeepSeek guide with general local AI guide
- Fix typos and improve readability
* fix: remove git conflict markers from deepseek guide frontmatter
* docs: improve local AI guide for beginners
Key improvements:
- Add detailed explanation of GGUF and why it's needed
- Improve content structure and readability
- Add visual guides with SEO-friendly images
- Enhance llama.cpp explanation with GitHub link
- Fix heading hierarchy for better navigation
- Add practical examples and common questions
- Update image paths and captions for better SEO
Technical details:
- Add proper image alt text and captions
- Link to llama.cpp GitHub repository
- Clarify model size requirements
- Simplify hardware requirements section
- Improve heading structure (h1-h5)
- Add step-by-step model installation guide
* docs: add offline ChatGPT alternative guide with Jan
- Add comprehensive guide on using Jan as offline ChatGPT alternative
- Include step-by-step instructions for setup
- Add images for document chat feature
- Optimize content for SEO with relevant keywords
* docs: update description to emphasize computer-local aspect
---------
Co-authored-by: Louis <louis@jan.ai>
- Add comprehensive guide on using Jan as offline ChatGPT alternative
- Include step-by-step instructions for setup
- Add images for document chat feature
- Optimize content for SEO with relevant keywords
Key improvements:
- Add detailed explanation of GGUF and why it's needed
- Improve content structure and readability
- Add visual guides with SEO-friendly images
- Enhance llama.cpp explanation with GitHub link
- Fix heading hierarchy for better navigation
- Add practical examples and common questions
- Update image paths and captions for better SEO
Technical details:
- Add proper image alt text and captions
- Link to llama.cpp GitHub repository
- Clarify model size requirements
- Simplify hardware requirements section
- Improve heading structure (h1-h5)
- Add step-by-step model installation guide
* docs: add DeepSeek R1 local installation guide
- Add comprehensive guide for running DeepSeek R1 locally
- Include step-by-step instructions with screenshots
- Add VRAM requirements and model selection guide
- Include system prompt setup instructions
* docs: add comprehensive guide on running AI models locally
* docs: address PR feedback for DeepSeek R1 and local AI guides
- Improve language and terminology throughout
- Add Linux support information
- Enhance technical explanations
- Update introduction for better flow
- Fix parameters section in run-ai-models-locally.mdx
* docs: improve local AI guides content and linking
- Update titles and introductions for better SEO
- Add opinionated guidance section for beginners
- Link DeepSeek guide with general local AI guide
- Fix typos and improve readability
* fix: remove git conflict markers from deepseek guide frontmatter
---------
Co-authored-by: Louis <louis@jan.ai>
- Resolve conflicts in deepseek-r1-locally.mdx and run-ai-models-locally.mdx
- Keep SEO-optimized content and structure
- Add new changelog posts for v0.5.13 and v0.5.14
- Update titles and introductions for better SEO
- Add opinionated guidance section for beginners
- Link DeepSeek guide with general local AI guide
- Fix typos and improve readability
* docs: add DeepSeek R1 local installation guide
- Add comprehensive guide for running DeepSeek R1 locally
- Include step-by-step instructions with screenshots
- Add VRAM requirements and model selection guide
- Include system prompt setup instructions
* docs: add comprehensive guide on running AI models locally
* docs: address PR feedback for DeepSeek R1 and local AI guides
- Improve language and terminology throughout
- Add Linux support information
- Enhance technical explanations
- Update introduction for better flow
- Fix parameters section in run-ai-models-locally.mdx
---------
Co-authored-by: Louis <louis@jan.ai>
- Improve language and terminology throughout
- Add Linux support information
- Enhance technical explanations
- Update introduction for better flow
- Fix parameters section in run-ai-models-locally.mdx
- Add comprehensive guide for running DeepSeek R1 locally
- Include step-by-step instructions with screenshots
- Add VRAM requirements and model selection guide
- Include system prompt setup instructions
Find up to 3 likely duplicate issues for a given GitHub issue.
To do this, follow these steps precisely:
1. Use an agent to check if the Github issue (a) is closed, (b) does not need to be deduped (eg. because it is broad product feedback without a specific solution, or positive feedback), or (c) already has a duplicates comment that you made earlier. If so, do not proceed.
2. Use an agent to view a Github issue, and ask the agent to return a summary of the issue
3. Then, launch 5 parallel agents to search Github for duplicates of this issue, using diverse keywords and search approaches, using the summary from #1
4. Next, feed the results from #1 and #2 into another agent, so that it can filter out false positives, that are likely not actually duplicates of the original issue. If there are no duplicates remaining, do not proceed.
5. Finally, comment back on the issue with a list of up to three duplicate issues (or zero, if there are no likely duplicates)
Notes (be sure to tell this to your agents, too):
- Use `gh` to interact with Github, rather than web fetch
- Do not use other tools, beyond `gh` (eg. don't use other MCP servers, file edit, etc.)
- Make a todo list first
- For your comment, follow the following format precisely (assuming for this example that you found 3 suspected duplicates):
value:"**Tip:** Download any HuggingFace model in app ([see guides](https://jan.ai/docs/models/manage-models#add-models)). Use this form for unsupported models only."
- type:textarea
validations:
required:true
attributes:
label:"Model Requests"
description:"If applicable, include the source URL, licenses, and any other relevant information"
Jan-beta App version {{ VERSION }}, has been released, use the following links to download the app with faster speed or visit the Github release page for more information:
- Windows:https://delta.jan.ai/beta/jan-beta-win-x64-{{ VERSION }}.exe
- macOS Universal:https://delta.jan.ai/beta/jan-beta-mac-universal-{{ VERSION }}.dmg
- Linux Deb:https://delta.jan.ai/beta/jan-beta-linux-amd64-{{ VERSION }}.deb
- Linux AppImage:https://delta.jan.ai/beta/jan-beta-linux-x86_64-{{ VERSION }}.AppImage
COMMENT="This is the build for this pull request. You can download it from the Artifacts section here: [Build URL](https://github.com/${{ github.repository }}/actions/runs/${RUN_ID})."
Jan-beta App version {{ VERSION }}, has been released, use the following links to download the app with faster speed or visit the Github release page for more information:
- Windows:https://delta.jan.ai/beta/Jan-beta_{{ VERSION }}_x64-setup.exe
- macOS Universal:https://delta.jan.ai/beta/Jan-beta_{{ VERSION }}_universal.dmg
- Linux Deb:https://delta.jan.ai/beta/Jan-beta_{{ VERSION }}_amd64.deb
- Linux AppImage:https://delta.jan.ai/beta/Jan-beta_{{ VERSION }}_amd64.AppImage
COMMENT="This is the build for this pull request. You can download it from the Artifacts section here: [Build URL](https://github.com/${{ github.repository }}/actions/runs/${RUN_ID})."
First off, thank you for considering contributing to jan. It's people like you that make jan such an amazing project.
First off, thank you for considering contributing to Jan. It's people like you that make Jan such an amazing project.
Jan is an AI assistant that can run 100% offline on your device. Think ChatGPT, but private, local, and under your complete control. If you're thinking about contributing, you're already awesome - let's make AI accessible to everyone, one commit at a time.
## Quick Links to Component Guides
- **[Web App](./web-app/CONTRIBUTING.md)** - React UI and logic
- **[Core SDK](./core/CONTRIBUTING.md)** - TypeScript SDK and extension system
- **[Extensions](./extensions/CONTRIBUTING.md)** - Supportive modules for the frontend
Apache 2.0 - Because sharing is caring. See [LICENSE](./LICENSE) for the legal stuff.
## Additional Notes
Thank you for contributing to jan!
We're building something pretty cool here - an AI assistant that respects your privacy and runs entirely on your machine. Every contribution, no matter how small, helps make AI more accessible to everyone.
Thanks for being part of the journey. Let's build the future of local AI together! 🚀
@tgz_count=$$(find pre-install -type f -name "*.tgz"| wc -l);dir_count=$$(find extensions -mindepth 1 -maxdepth 1 -type d -exec test -e '{}/package.json'\; -print | wc -l);if[$$tgz_count -ne $$dir_count ];thenecho"Number of .tgz files in pre-install ($$tgz_count) does not match the number of subdirectories in extension ($$dir_count)";exit 1;elseecho"Extension build successful";fi
⚠️ <b> Jan is currently in Development</b>: Expect breaking changes and bugs!
</p>
Jan is bringing the best of open-source AI in an easy-to-use product. Download and run LLMs with **full control** and **privacy**.
## Installation
Jan is a ChatGPT-alternative that runs 100% offline on your device. Our goal is to make it easy for a layperson to download and run LLMs and use AI with **full control** and **privacy**.
Jan is powered by [Cortex](https://github.com/janhq/cortex.cpp), our embeddable local AI engine that runs on any hardware.
From PCs to multi-GPU clusters, Jan & Cortex supports universal architectures:
- [x] NVIDIA GPUs (fast)
- [x] Apple M-series (fast)
- [x] Apple Intel
- [x] Linux Debian
- [x] Windows x64
#### Features:
- [Model Library](https://jan.ai/docs/models/manage-models#add-models) with popular LLMs like Llama, Gemma, Mistral, or Qwen
- Connect to [Remote AI APIs](https://jan.ai/docs/remote-models/openai) like Groq and OpenRouter
- Local API Server with OpenAI-equivalent API
- [Extensions](https://jan.ai/docs/extensions) for customizing Jan
## Download
The easiest way to get started is by downloading one of the following versions for your respective operating system:
Download the latest version of Jan at https://jan.ai/ or visit the [GitHub Releases](https://github.com/janhq/jan/releases) to download any previous release.
## Demo
Download from [jan.ai](https://jan.ai/) or [GitHub Releases](https://github.com/janhq/jan/releases).
*Real-time Video: Jan v0.5.7 on a Mac M2, 16GB Sonoma 14.2*
- **Local AI Models**: Download and run LLMs (Llama, Gemma, Qwen, GPT-oss etc.) from HuggingFace
- **Cloud Integration**: Connect to GPT models via OpenAI, Claude models via Anthropic, Mistral, Groq, and others
- **Custom Assistants**: Create specialized AI assistants for your tasks
- **OpenAI-Compatible API**: Local server at `localhost:1337` for other applications
- **Model Context Protocol**: MCP integration for agentic capabilities
- **Privacy First**: Everything runs locally when you want it to
## Quicklinks
## Build from Source
### Jan
For those who enjoy the scenic route:
- [Jan Website](https://jan.ai/)
- [Jan GitHub](https://github.com/janhq/jan)
- [Documentation](https://jan.ai/docs)
- [Jan Changelog](https://jan.ai/changelog)
- [Jan Blog](https://jan.ai/blog)
### Prerequisites
### Cortex.cpp
Jan is powered by **Cortex.cpp**. It is a C++ command-line interface (CLI) designed as an alternative to [Ollama](https://ollama.com/). By default, it runs on the llama.cpp engine but also supports other engines, including ONNX and TensorRT-LLM, making it a multi-engine platform.
- glibc 2.27 or higher (check with `ldd --version`)
- gcc 11, g++ 11, cpp 11 or higher, refer to this [link](https://jan.ai/guides/troubleshooting/gpu-not-used/#specific-requirements-for-linux) for more information
- To enable GPU support:
- Nvidia GPU with CUDA Toolkit 11.7 or higher
- Nvidia driver 470.63.01 or higher
This handles everything: installs dependencies, builds core components, and launches the app.
**Available make targets:**
- `make dev` - Full development setup and launch
- `make build` - Production build
- `make test` - Run tests and linting
- `make clean` - Delete everything and start fresh
### Manual Commands
```bash
yarn install
yarn build:tauri:plugin:api
yarn build:core
yarn build:extensions
yarn dev
```
## System Requirements
**Minimum specs for a decent experience:**
- **macOS**: 13.6+ (8GB RAM for 3B models, 16GB for 7B, 32GB for 13B)
- **Windows**: 10+ with GPU support for NVIDIA/AMD/Intel Arc
- **Linux**: Most distributions work, GPU acceleration available
For detailed compatibility, check our [installation guides](https://jan.ai/docs/desktop/mac).
## Troubleshooting
As Jan is in development mode, you might get stuck on a some common issues:
- [Troubleshooting a broken build](https://jan.ai/docs/troubleshooting#broken-build)
- [Documentation](https://jan.ai/docs) - The manual you should read
- [API Reference](https://jan.ai/api-reference) - For the technically inclined
- [Changelog](https://jan.ai/changelog) - What we broke and fixed
- [Discord](https://discord.gg/FTk2MvZwJH) - Where the community lives
## Contact
- Bugs & requests: file a GitHub ticket
- For discussion: join our Discord [here](https://discord.gg/FTk2MvZwJH)
- For business inquiries: email hello@jan.ai
- For jobs: please email hr@jan.ai
## Trust & Safety
Beware of scams!
- We will never request your personal information.
- Our product is completely free; no paid version exists.
- We do not have a token or ICO.
- We are a [bootstrapped company](https://en.wikipedia.org/wiki/Bootstrapping), and don't have any external investors (*yet*). We're open to exploring opportunities with strategic partners want to tackle [our mission](https://jan.ai/about#mission) together.
<summary>Local AI Assistant that runs 100% offline on your device</summary>
<description>
<p>
Jan is a ChatGPT-alternative that runs 100% offline on your device. Our goal is to make it easy for anyone to download and run LLMs and use AI with full control and privacy.
</p>
<p>Features:</p>
<ul>
<li>Model Library with popular LLMs like Llama, Gemma, Mistral, or Qwen</li>
<li>Connect to Remote AI APIs like Groq and OpenRouter</li>
<li>Local API Server with OpenAI-equivalent API</li>
Before testing, set-up the following in the old version to make sure that we can see the data is properly migrated:
- [ ] Changing appearance / theme to something that is obviously different from default set-up
- [ ] Ensure there are a few chat threads
- [ ] Ensure there are a few favourites / star threads
- [ ] Ensure there are 2 model downloaded
- [ ] Ensure there are 2 import on local provider (llama.cpp)
- [ ] Modify MCP servers list and add some ENV value to MCP servers
- [ ] Modify Local API Server
- [ ] HTTPS proxy config value
- [ ] Add 2 custom assistants to Jan
- [ ] Create a new chat with the custom assistant
- [ ] Change the `App Data` to some other folder
- [ ] Create a Custom Provider
- [ ] Disabled some model providers
- [NEW] Change llama.cpp setting of 2 models
#### Validate that the update does not corrupt existing user data or settings (before and after update show the same information):
- [ ] Threads
- [ ] Previously used model and assistants is shown correctly
- [ ] Can resume chat in threads with the previous context
- [ ] Assistants
- Settings:
- [ ] Appearance
- [ ] MCP Servers
- [ ] Local API Server
- [ ] HTTPS Proxy
- [ ] Custom Provider Set-up
#### In `Hub`:
- [ ] Can see model from HF listed properly
- [ ] Downloaded model will show `Use` instead of `Download`
- [ ] Toggling on `Downloaded` on the right corner show the correct list of downloaded models
#### In `Settings -> General`:
- [ ] Ensure the `App Data` path is the same
- [ ] Click Open Logs, App Log will show
#### In `Settings -> Model Providers`:
- [ ] Llama.cpp still listed downloaded models and user can chat with the models
- [ ] Llama.cpp still listed imported models and user can chat with the models
- [ ] Remote model still retain previously set up API keys and user can chat with model from the provider without having to re-enter API keys
- [ ] Enabled and Disabled Model Providers stay the same as before update
#### In `Settings -> Extensions`, check that following exists:
- [ ] Conversational
- [ ] Jan Assistant
- [ ] Download Manager
- [ ] llama.cpp Inference Engine
## B. `Settings`
#### In `General`:
- [ ] Ensure `Community` links work and point to the correct website
- [ ] Ensure the `Check for Updates` function detect the correct latest version
- [ ] [ENG] Create a folder with un-standard character as title (e.g. Chinese character) => change the `App data` location to that folder => test that model is still able to load and run properly.
#### In `Appearance`:
- [ ] Toggle between different `Theme` options to check that they change accordingly and that all elements of the UI are legible with the right contrast:
- [ ] Light
- [ ] Dark
- [ ] System (should follow your OS system settings)
- [ ] Change the following values => close the application => re-open the application => ensure that the change is persisted across session:
- [ ] Theme
- [ ] Font Size
- [ ] Window Background
- [ ] App Main View
- [ ] Primary
- [ ] Accent
- [ ] Destructive
- [ ] Chat Width
- [ ] Ensure that when this value is changed, there is no broken UI caused by it
- [ ] Code Block
- [ ] Show Line Numbers
- [ENG] Ensure that when click on `Reset` in the `Appearance` section, it reset back to the default values
- [ENG] Ensure that when click on `Reset` in the `Code Block` section, it reset back to the default values
#### In `Model Providers`:
In `Llama.cpp`:
- [ ] After downloading a model from hub, the model is listed with the correct name under `Models`
- [ ] Can import `gguf` model with no error
- [ ] Imported model will be listed with correct name under the `Models`
- [ ] Check that when click `delete` the model will be removed from the list
- [ ] Deleted model doesn't appear in the selectable models section in chat input (even in old threads that use the model previously)
- [ ] Ensure that user can re-import deleted imported models
- [ ] Enable `Auto-Unload Old Models`, and ensure that only one model can run / start at a time. If there are two model running at the time of enable, both of them will be stopped.
- [ ] Disable `Auto-Unload Old Models`, and ensure that multiple models can run at the same time.
- [ ] Enable `Context Shift` and ensure that context can run for long without encountering memory error. Use the `banana test` by turn on fetch MCP => ask local model to fetch and summarize the history of banana (banana has a very long history on wiki it turns out). It should run out of context memory sufficiently fast if `Context Shift` is not enabled.
- [ ] Ensure that user can change the Jinja chat template of individual model and it doesn't affect the template of other model
- [ ] Ensure that there is a recommended `llama.cpp` for each system and that it works out of the box for users.
- [ ] [0.6.9] Take a `gguf` file and delete the `.gguf` extensions from the file name, import it into Jan and verify that it works.
In Remote Model Providers:
- [ ] Check that the following providers are presence:
- [ ] OpenAI
- [ ] Anthropic
- [ ] Cohere
- [ ] OpenRouter
- [ ] Mistral
- [ ] Groq
- [ ] Gemini
- [ ] Hugging Face
- [ ] Models should appear as available on the selectable dropdown in chat input once some value is input in the API key field. (it could be the wrong API key)
- [ ] Once a valid API key is used, user can select a model from that provider and chat without any error.
- [ ] Delete a model and ensure that it doesn't show up in the `Modesl` list view or in the selectable dropdown in chat input.
- [ ] Ensure that a deleted model also not selectable or appear in old threads that used it.
- [ ] Adding of new model manually works and user can chat with the newly added model without error (you can add back the model you just delete for testing)
- [ ] [0.6.9] Make sure that Ollama set-up as a custom provider work with Jan
In Custom Providers:
- [ ] Ensure that user can create a new custom providers with the right baseURL and API key.
- [ ] Click `Refresh` should retrieve a list of available models from the Custom Providers.
- [ ] User can chat with the custom providers
- [ ] Ensure that Custom Providers can be deleted and won't reappear in a new session
In general:
- [ ] Disabled Model Provider should not show up as selectable in chat input of new thread and old thread alike (old threads' chat input should show `Select Model` instead of disabled model)
#### In `Shortcuts`:
Make sure the following shortcut key combo is visible and works:
- [ ] New chat
- [ ] Toggle Sidebar
- [ ] Zoom In
- [ ] Zoom Out
- [ ] Send Message
- [ ] New Line
- [ ] Navigation
#### In `Hardware`:
Ensure that the following section information show up for hardware
- [ ] Operating System
- [ ] CPU
- [ ] Memory
- [ ] GPU (If the machine has one)
- [ ] Enabling and Disabling GPUs and ensure that model still run correctly in both mode
- [ ] Enabling or Disabling GPU should not affect the UI of the application
#### In `MCP Servers`:
- [ ] Ensure that an user can create a MCP server successfully when enter in the correct information
- [ ] Ensure that `Env` value is masked by `*` in the quick view.
- [ ] If an `Env` value is missing, there should be a error pop up.
- [ ] Ensure that deleted MCP server disappear from the `MCP Server` list without any error
- [ ] Ensure that before a MCP is deleted, it will be disable itself first and won't appear on the tool list after deleted.
- [ ] Ensure that when the content of a MCP server is edited, it will be updated and reflected accordingly in the UI and when running it.
- [ ] Toggling enable and disabled of a MCP server work properly
- [ ] A disabled MCP should not appear in the available tool list in chat input
- [ ] An disabled MCP should not be callable even when forced prompt by the model (ensure there is no ghost MCP server)
- [ ] Ensure that enabled MCP server start automatically upon starting of the application
- [ ] An enabled MCP should show functions in the available tool list
- [ ] User can use a model and call different tool from multiple enabled MCP servers in the same thread
- [ ] If `Allow All MCP Tool Permissions` is disabled, in every new thread, before a tool is called, there should be a confirmation dialog pop up to confirm the action.
- [ ] When the user click `Deny`, the tool call will not be executed and return a message indicate so in the tool call result.
- [ ] When the user click `Allow Once` on the pop up, a confirmation dialog will appear again when the tool is called next time.
- [ ] When the user click `Always Allow` on the pop up, the tool will retain permission and won't ask for confirmation again. (this applied at an individual tool level, not at the MCP server level)
- [ ] If `Allow All MCP Tool Permissions` is enabled, in every new thread, there should not be any confirmation dialog pop up when a tool is called.
- [ ] When the pop-up appear, make sure that the `Tool Parameters` is also shown with detail in the pop-up.a
- [ ] [0.6.9] Go to Enter JSON configuration when created a new MCp => paste the JSON config inside => click `Save` => server works
- [ ] [0.6.9] If individual JSON config format is failed, the MCP server should not be activated
- [ ] [0.6.9] Make sure that MCP server can be used with streamable-http transport => connect to Smithery and test MCP server
#### In `Local API Server`:
- [ ] User can `Start Server` and chat with the default endpoint
- [ ] User should see the correct model name at `v1/models`
- [ ] User should be able to chat with it at `v1/chat/completions`
- [ ] `Open Logs` show the correct query log send to the server and return from the server
- [ ] Make sure that changing all the parameter in `Server Configuration` is reflected when `Start Server`
- [ ] [0.6.9] When the startup configuration, the last used model is also automatically start (users does not have to manually start a model before starting the server)
- [ ] [0.6.9] Make sure that you can send an image to a Local API Server and it also works (can set up Local API Server as a Custom Provider in Jan to test)
#### In `HTTPS Proxy`:
- [ ] Model download request goes through proxy endpoint
## C. Hub
- [ ] User can click `Download` to download a model
- [ ] User can cancel a model in the middle of downloading
- [ ] User can add a Hugging Face model detail to the list by pasting a model name / model url into the search bar and press enter
- [ ] Clicking on a listing will open up the model card information within Jan and render the HTML properly
- [ ] Clicking download work on the `Show variants` section
- [ ] Clicking download work inside the Model card HTML
- [ ] [0.6.9] Check that the model recommendation base on user hardware work as expected in the Model Hub
## D. Threads
#### In the left bar:
- [ ] User can delete an old thread, and it won't reappear even when app restart
- [ ] Change the title of the thread should update its last modification date and re-organise its position in the correct chronological order on the left bar.
- [ ] The title of a new thread is the first message from the user.
- [ ] Users can starred / un-starred threads accordingly
- [ ] Starred threads should move to `Favourite` section and other threads should stay in `Recent`
- [ ] Ensure that the search thread feature return accurate result based on thread titles and contents (including from both `Favourite` and `Recent`)
- [ ] `Delete All` should delete only threads in the `Recents` section
- [ ] `Unstar All` should un-star all of the `Favourites` threads and return them to `Recent`
#### In a thread:
- [ ] When `New Chat` is clicked, the assistant is set as the last selected assistant, the model selected is set as the last used model, and the user can immediately chat with the model.
- [ ] User can conduct multi-turn conversation in a single thread without lost of data (given that `Context Shift` is not enabled)
- [ ] User can change to a different model in the middle of a conversation in a thread and the model work.
- [ ] User can click on `Regenerate` button on a returned message from the model to get a new response base on the previous context.
- [ ] User can change `Assistant` in the middle of a conversation in a thread and the new assistant setting will be applied instead.
- [ ] The chat windows can render and show all the content of a selected threads (including scroll up and down on long threads)
- [ ] Old thread retained their setting as of the last update / usage
- [ ] Assistant option
- [ ] Model option (except if the model / model provider has been deleted or disabled)
- [ ] User can send message with different type of text content (e.g text, emoji, ...)
- [ ] When request model to generate a markdown table, the table is correctly formatted as returned from the model.
- [ ] When model generate code, ensure that the code snippets is properly formatted according to the `Appearance -> Code Block` setting.
- [ ] Users can edit their old message and and user can regenerate the answer based on the new message
- [ ] User can click `Copy` to copy the model response
- [ ] User can click `Delete` to delete either the user message or the model response.
- [ ] The token speed appear when a response from model is being generated and the final value is show under the response.
- [ ] Make sure that user when using IME keyboard to type Chinese and Japanese character and they press `Enter`, the `Send` button doesn't trigger automatically after each words.
- [ ] [0.6.9] Attach an image to the chat input and see if you can chat with it using a remote model
- [ ] [0.6.9] Attach an image to the chat input and see if you can chat with it using a local model
- [ ] [0.6.9] Check that you can paste an image to text box from your system clipboard (Copy - Paste)
- [ ] [0.6.9] Make sure that user can favourite a model in the model selection in chat input
## E. Assistants
- [ ] There is always at least one default Assistant which is Jan
- [ ] The default Jan assistant has `stream = True` by default
- [ ] User can create / edit a new assistant with different parameters and instructions choice.
- [ ] When user delete the default Assistant, the next Assistant in line will be come the default Assistant and apply their setting to new chat accordingly.
- [ ] User can create / edit assistant from within a Chat windows (on the top left)
## F. After checking everything else
In `Settings -> General`:
- [ ] Change the location of the `App Data` to some other path that is not the default path
- [ ] Click on `Reset` button in `Other` to factory reset the app:
- [ ] All threads deleted
- [ ] All Assistant deleted except for default Jan Assistant
- [ ] `App Data` location is reset back to default path
- [ ] Appearance reset
- [ ] Model Providers information all reset
- [ ] Llama.cpp setting reset
- [ ] API keys cleared
- [ ] All Custom Providers deleted
- [ ] MCP Servers reset
- [ ] Local API Server reset
- [ ] HTTPS Proxy reset
- [ ] After closing the app, all models are unloaded properly
- [ ] Locate to the data folder using the `App Data` path information => delete the folder => reopen the app to check that all the folder is re-created with all the necessary data.
- [ ] Ensure that the uninstallation process removes the app successfully from the system.
## G. New App Installation
- [ ] Clean up by deleting all the left over folder created by Jan
- [ ] Do some basic check to see that all function still behaved as expected. To be extra careful, you can go through the whole list again. However, it is more advisable to just check to make sure that all the core functionality like `Thread` and `Model Providers` work as intended.
# II. After release
- [ ] Check that the App Updater works and user can update to the latest release without any problem
- [ ] App restarts after the user finished an update
- [ ] Repeat section `A. Initial update / migration Data check` above to verify that update is done correctly on live version
This directory contains platform-specific scripts used by the AutoQA GitHub Actions workflow. These scripts help maintain a cleaner and more maintainable workflow file by extracting complex inline scripts into separate files.
## Directory Structure
```text
autoqa/scripts/
├── setup_permissions.sh # Setup executable permissions for all scripts
├── windows_cleanup.ps1 # Windows: Clean existing Jan installations
├── windows_download.ps1 # Windows: Download Jan app installer
├── windows_install.ps1 # Windows: Install Jan app
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.