Nicholai/jan - jan - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Roushan Kumar Singh	3f51c35229	feat: support .zip archives for manual backend install (#6534 ) * feat(llamacpp): support .zip archives for manual backend install * Update Lock Files	2025-09-23 18:02:06 +05:30
Akarshan Biswas	885da29f28	feat: add getTokensCount method to compute token usage (#6467 ) * feat: add getTokensCount method to compute token usage Implemented a new async `getTokensCount` function in the LLaMA.cpp extension. The method validates the model session, checks process health, applies the request template, and tokenizes the resulting prompt to return the token count. Includes detailed error handling for crashed models and API failures, enabling callers to assess token usage before sending completions. * Fix: typos * chore: update ui token usage * chore: remove unused code * feat: add image token handling for multimodal LlamaCPP models Implemented support for counting image tokens when using vision-enabled models: - Extended `SessionInfo` with optional `mmprojPath` to store the multimodal project file. - Propagated `mmproj_path` from the Tauri plugin into the session info. - Added import of `chatCompletionRequestMessage` and enhanced token calculation logic in the LlamaCPP extension: - Detects image content in messages. - Reads GGUF metadata from `mmprojPath` to compute accurate image token counts. - Provides a fallback estimation if metadata reading fails. - Returns the sum of text and image tokens. - Introduced helper methods `calculateImageTokens` and `estimateImageTokensFallback`. - Minor clean‑ups such as comment capitalization and debug logging. * chore: update FE send params message include content type image_url * fix mmproj path from session info and num tokens calculation * fix: Correct image token estimation calculation in llamacpp extension This commit addresses an inaccurate token count for images in the llama.cpp extension. The previous logic incorrectly calculated the token count based on image patch size and dimensions. This has been replaced with a more precise method that uses the clip.vision.projection_dim value from the model metadata. Additionally, unnecessary debug logging was removed, and a new log was added to show the mmproj metadata for improved visibility. * fix per image calc * fix: crash due to force unwrap --------- Co-authored-by: Faisal Amir <urmauur@gmail.com> Co-authored-by: Louis <louis@jan.ai>	2025-09-23 07:52:19 +05:30
Akarshan Biswas	bf7f176741	feat: Prompt progress when streaming (#6503 ) * feat: Prompt progress when streaming - BE changes: - Add a `return_progress` flag to `chatCompletionRequest` and a corresponding `prompt_progress` payload in `chatCompletionChunk`. Introduce `chatCompletionPromptProgress` interface to capture cache, processed, time, and total token counts. - Update the Llamacpp extension to always request progress data when streaming, enabling UI components to display real‑time generation progress and leverage llama.cpp’s built‑in progress reporting. * Make return_progress optional * chore: update ui prompt progress before streaming content * chore: remove log * chore: remove progress when percentage >= 100 * chore: set timeout prompt progress * chore: move prompt progress outside streaming content * fix: tests --------- Co-authored-by: Faisal Amir <urmauur@gmail.com> Co-authored-by: Louis <louis@jan.ai>	2025-09-22 20:37:27 +05:30
Dinh Long Nguyen	645548e931	Merge pull request #6516 from menloresearch/release/v0.6.10	2025-09-18 19:15:54 +07:00
Akarshan Biswas	b9f658f2ae	fix: correct memory suitability checks in llamacpp extension (#6504 ) The previous implementation mixed model size and VRAM checks, leading to inaccurate status reporting (e.g., false RED results). - Simplified import statement for `readGgufMetadata`. - Fixed RAM/VRAM comparison by removing unnecessary parentheses. - Replaced ambiguous `modelSize > usableTotalMemory` check with a clear `totalRequired > usableTotalMemory` hard‑limit condition. - Refactored the status logic to explicitly handle the CPU‑GPU hybrid scenario, returning YELLOW when the total requirement fits combined memory but exceeds VRAM. - Updated comments for better readability and maintenance.	2025-09-17 20:09:33 +05:30
Louis	15164fc0be	feat: system tray icon build flag	2025-09-17 15:54:20 +07:00
Akarshan Biswas	9e3a77a559	fix: set default memory mode and clean up unused import (#6463 ) Use fallback value 'high' for memory_util config and remove unused GgufMetadata import.	2025-09-15 19:00:46 +05:30
Akarshan Biswas	489c5a3d9c	fix: KVCache size calculation and refactor (#6438 ) - Removed the unused `getKVCachePerToken` helper and replaced it with a unified `estimateKVCache` that returns both total size and per‑token size. - Fixed the KV cache size calculation to account for all layers, correcting previous under‑estimation. - Added proper clamping of user‑requested context lengths to the model’s maximum. - Refactored VRAM budgeting: introduced explicit reserves, fixed engine overhead, and separate multipliers for VRAM and system RAM based on memory mode. - Implemented a more robust planning flow with clear GPU, Hybrid, and CPU pathways, including fallback configurations when resources are insufficient. - Updated default context length handling and safety buffers to prevent OOM situations. - Adjusted usable memory percentage to 90 % and refined logging for easier debugging.	2025-09-15 10:16:13 +05:30
Akarshan Biswas	654e566dcb	fix: correct context shift flag handling in LlamaCPP extension (#6404 ) (#6431 ) * fix: correct context shift flag handling in LlamaCPP extension The previous implementation added the `--no-context-shift` flag when `cfg.ctx_shift` was disabled, which conflicted with the llama.cpp CLI where the presence of `--context-shift` enables the feature. The logic is updated to push `--context-shift` only when `cfg.ctx_shift` is true, ensuring the extension passes the correct argument and behaves as expected. * feat: detect model out of context during generation --------- Co-authored-by: Dinh Long Nguyen <dinhlongviolin1@gmail.com>	2025-09-12 13:43:31 +05:30
Dinh Long Nguyen	b5b6e1dc19	add mcp for web (#6411 ) * add mcp for web * update /jan/v1 endpoint to /v1 * update mise and makefile * update yarn lock * use mcp oauth properly	2025-09-12 12:14:10 +07:00
Faisal Amir	e709d200aa	Merge pull request #6416 from menloresearch/enhancement/experimental-label enhancement: add label experimental for optimize setting	2025-09-11 16:12:35 +07:00
Akarshan	7c41408a1a	feat: add relative path support for model loading Implemented `isAbsolutePath` helper to correctly identify POSIX, Windows drive‑letter, and UNC absolute paths. Updated `planModelLoad` to automatically resolve relative model and mmproj paths against the Jan data folder, enhancing usability for users supplying non‑absolute paths. Also refined minor formatting for readability.	2025-09-11 13:45:29 +05:30
Akarshan	abd0cbe599	refactor: rename noOffloadMmproj flag to offloadMmproj and reorder args The flag `noOffloadMmproj` was misleading – it actually indicates when the mmproj file is offloaded to VRAM. Renaming it to `offloadMmproj` clarifies its purpose and aligns the naming with the surrounding code. Additionally, the `planModelLoad` signature has been reordered to place `mmprojPath` before `requestedCtx`, improving readability and making the optional parameters more intuitive. All related logic, calculations, and log messages have been updated to use the new flag name.	2025-09-11 12:29:53 +05:30
Louis	7fea6e1ab0	fix: clean up unused packages (#6414 )	2025-09-11 13:16:26 +07:00
Akarshan Biswas	5ff7935d91	fix: include lm_head and embedding layers in totalLayers count (#6415 ) The original calculation used only the `block_count` from the model metadata, which excludes the final LM head and the embedding layer. This caused an underestimation of the total number of layers and consequently an incorrect `layerSize` value. Adding `+2` accounts for these two missing layers, ensuring accurate model size metrics.	2025-09-11 11:40:39 +05:30
Akarshan	13806a3f06	Fixup sorting in determineBestBackend	2025-09-11 09:56:46 +05:30
Akarshan Biswas	3cd099ee87	Update extensions/llamacpp-extension/src/index.ts Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>	2025-09-11 09:55:57 +05:30
Akarshan	42411b5f33	feat: prioritize Vulkan backend only when GPU has ≥6 GB VRAM Added a GPU memory check using `getSystemInfo` to ensure Vulkan is selected only on systems with at least 6 GB of VRAM. * Made `determineBestBackend` asynchronous and updated all callers to `await` it. * Adjusted backend priority list to include or demote Vulkan based on the memory check. * Updated Vulkan support detection in `backend.ts` to rely solely on API version (memory check moved to selection logic). * Imported `getSystemInfo` and refined file‑existence validation. These changes prevent sub‑optimal Vulkan usage on low‑memory GPUs and improve backend selection reliability.	2025-09-11 09:55:55 +05:30
Akarshan	84874c6039	fix file condition	2025-09-11 09:55:08 +05:30
Akarshan	0eff1bfaa9	Throw error when invalid file	2025-09-11 09:55:08 +05:30
Akarshan	5ef9d8dfc3	Add debug logs and refactor	2025-09-11 09:55:06 +05:30
Akarshan	2e350ab607	Refresh list of backends by calling configureBackends() and some refactoring in installBackend	2025-09-11 09:52:09 +05:30
Faisal Amir	ba4dc6d1eb	enhancement: update ui dialog update llamacpp backend	2025-09-11 09:52:09 +05:30
Akarshan	a6e4f28830	Add guard before checking locally installed backends	2025-09-11 09:52:09 +05:30
Akarshan	4e37c361c4	feat: expose new updateBackend function for manually updating backend	2025-09-11 09:52:09 +05:30
Akarshan	7ac927ff02	feat: enhance llamacpp backend management and installation - Add `src-tauri/resources/` to `.gitignore`. - Introduced utilities to read locally installed backends (`getLocalInstalledBackends`) and fetch remote supported backends (`fetchRemoteSupportedBackends`). - Refactored `listSupportedBackends` to merge remote and local entries with deduplication and proper sorting. - Exported `getBackendDir` and integrated it into the extension. - Added helper `parseBackendVersion` and new method `checkBackendForUpdates` to detect newer backend versions. - Implemented `installBackend` for manual backend archive installation, including platform‑specific binary path handling. - Updated command‑line argument logic for `--flash-attn` to respect version‑specific defaults. - Modified Tauri filesystem `decompress` command to remove overly strict path validation.	2025-09-11 09:52:09 +05:30
Akarshan Biswas	7a174e621a	feat: Smart model management (#6390 ) * feat: Smart model management * New UI option – `memory_util` added to `settings.json` with a dropdown (high / medium / low) to let users control how aggressively the engine uses system memory. * Configuration updates – `LlamacppConfig` now includes `memory_util`; the extension class stores it in a new `memoryMode` property and handles updates through `updateConfig`. * System memory handling * Introduced `SystemMemory` interface and `getTotalSystemMemory()` to report combined VRAM + RAM. * Added helper methods `getKVCachePerToken`, `getLayerSize`, and a new `ModelPlan` type. * Smart model‑load planner – `planModelLoad()` computes: * Number of GPU layers that can fit in usable VRAM. * Maximum context length based on KV‑cache size and the selected memory utilization mode (high/medium/low). * Whether KV‑cache must be off‑loaded to CPU and the overall loading mode (GPU, Hybrid, CPU, Unsupported). * Detailed logging of the planning decision. * Improved support check – `isModelSupported()` now: * Uses the combined VRAM/RAM totals from `getTotalSystemMemory()`. * Applies an 80% usable‑memory heuristic. * Returns GREEN only when both weights and KV‑cache fit in VRAM, YELLOW when they fit only in total memory or require CPU off‑load, and RED when the model cannot fit at all. * Cleanup – Removed unused `GgufMetadata` import; updated imports and type definitions accordingly. * Documentation/comments – Added explanatory JSDoc comments for the new methods and clarified the return semantics of `isModelSupported`. * chore: migrate no_kv_offload from llamacpp setting to model setting * chore: add UI auto optimize model setting * feat: improve model loading planner with mmproj support and smarter memory budgeting * Extend `ModelPlan` with optional `noOffloadMmproj` flag to indicate when a multimodal projector can stay in VRAM. * Add `mmprojPath` parameter to `planModelLoad` and calculate its size, attempting to keep it on GPU when possible. * Refactor system memory detection: * Use `used_memory` (actual free RAM) instead of total RAM for budgeting. * Introduced `usableRAM` placeholder for future use. * Rewrite KV‑cache size calculation: * Properly handle GQA models via `attention.head_count_kv`. * Compute bytes per token as `nHeadKV * headDim * 2 * 2 * nLayer`. * Replace the old 70 % VRAM heuristic with a more flexible budget: * Reserve a fixed VRAM amount and apply an overhead factor. * Derive usable system RAM from total memory minus VRAM. * Implement a robust allocation algorithm: * Prioritize placing the mmproj in VRAM. * Search for the best balance of GPU layers and context length. * Fallback strategies for hybrid and pure‑CPU modes with detailed safety checks. * Add extensive validation of model size, KV‑cache size, layer size, and memory mode. * Improve logging throughout the planning process for easier debugging. * Adjust final plan return shape to include the new `noOffloadMmproj` field. * remove unused variable --------- Co-authored-by: Faisal Amir <urmauur@gmail.com>	2025-09-11 09:48:03 +05:30
Faisal Amir	be851ebcf1	chore: validate gguf file base metadata architecture	2025-09-08 20:16:20 +07:00
Faisal Amir	836990b7d9	chore: update fn check mmproj file	2025-09-08 11:10:00 +07:00
Dinh Long Nguyen	a30eb7f968	feat: Jan Web (reusing Jan Desktop UI) (#6298 ) * add platform guards * add service management * fix types * move to zustand for servicehub * update App Updater * update tauri missing move * update app updater * refactor: move PlatformFeatures to separate const file 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * change tauri fetch name * update implementation * update extension fetch * make web version run properly * disabled unused web settings * fix all tests * fix lint * fix tests * add mock for extension * fix build * update make and mise * fix tsconfig for web-extensions * fix loader type * cleanup * fix test * update error handling + mcp should be working * Update mcp init * use separate is_web_app build property * Remove fixed model catalog url * fix additional tests * fix download issue (event emitter not implemented correctly) * Update Title html * fix app logs * update root tsx render timing --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-09-05 01:47:46 +07:00
hiento09	1b74772d07	feat: download llamacpp backend fail back to cdn (#6361 ) * feat: download llamacpp backend fail back to cdn incase github api encounters errors	2025-09-04 09:39:16 +07:00
Akarshan Biswas	5fae954ac5	fix: Use 80% total memory for compatibility check (#6321 ) * fix: Use 80% total memory for compatibility check * refactor: extract usable memory percentage to named constant Extract the hardcoded 0.8 multiplier into a named constant USABLE_MEMORY_PERCENTAGE for better readability and maintainability.	2025-08-28 14:50:00 +05:30
Akarshan Biswas	64a608039b	fix: check for env value before setting (#6266 ) * fix: check for env value before setting * Use empty instead of none	2025-08-21 22:55:49 +05:30
Akarshan Biswas	510c70bdf7	feat: Add model compatibility check and memory estimation (#6243 ) * feat: Add model compatibility check and memory estimation This commit introduces a new feature to check if a given model is supported based on available device memory. The change includes: - A new `estimateKVCache` method that calculates the required memory for the model's KV cache. It uses GGUF metadata such as `block_count`, `head_count`, `key_length`, and `value_length` to perform the calculation. - An `isModelSupported` method that combines the model file size and the estimated KV cache size to determine the total memory required. It then checks if any available device has sufficient free memory to load the model. - An updated error message for the `version_backend` check to be more user-friendly, suggesting a stable internet connection as a potential solution for backend setup failures. This functionality helps prevent the application from attempting to load models that would exceed the device's memory capacity, leading to more stable and predictable behavior. fixes: #5505 * Update extensions/llamacpp-extension/src/index.ts Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * Update extensions/llamacpp-extension/src/index.ts Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * Extend this to available system RAM if GGML device is not available * fix: Improve model metadata and memory checks This commit refactors the logic for checking if a model is supported by a system's available memory. Key changes: - Remote model support: The `read_gguf_metadata` function can now fetch metadata from a remote URL by reading the file in chunks. - Improved KV cache size calculation: The KV cache size is now estimated more accurately by using `attention.key_length` and `attention.value_length` from the GGUF metadata, with a fallback to `embedding_length`. - Granular memory check statuses: The `isModelSupported` function now returns a more specific status (`'RED'`, `'YELLOW'`, `'GREEN'`) to indicate whether the model weights or the KV cache are too large for the available memory. - Consolidated logic: The logic for checking local and remote models has been consolidated into a single `isModelSupported` function, improving code clarity and maintainability. These changes provide more robust and informative model compatibility checks, especially for models hosted on remote servers. * Update extensions/llamacpp-extension/src/index.ts Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * Make ctx_size optional and use sum free memory across ggml devices * feat: hub and dropdown model selection handle model compatibility * feat: update bage model info color * chore: enable detail page to get compatibility model * chore: update copy * chore: update shrink indicator UI --------- Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> Co-authored-by: Faisal Amir <urmauur@gmail.com>	2025-08-21 16:13:50 +05:30
Akarshan Biswas	9c25480c7b	fix: Update placeholder text and error message (#6263 ) This commit improves the clarity of the llama.cpp extension. - Corrected a placeholder example from `GGML_VK_VISIBLE_DEVICES='0,1'` to `GGML_VK_VISIBLE_DEVICES=0,1` for better accuracy. - Changed an ambiguous error message from `"Failed to load llama-server: ${error}"` to the more specific `"Failed to load llamacpp backend"`.	2025-08-21 16:01:31 +05:30
Akarshan Biswas	5c3a6fec32	feat: Add support for custom environmental variables to llama.cpp (#6256 ) This commit adds a new setting `llamacpp_env` to the llama.cpp extension, allowing users to specify custom environment variables. These variables are passed to the backend process when it starts. A new function `parseEnvFromString` is introduced to handle the parsing of the semicolon-separated key-value pairs from the user input. The environment variables are then used in the `load` function and when listing available devices. This enables more flexible configuration of the llama.cpp backend, such as specifying visible GPUs for Vulkan. This change also updates the Tauri command `get_devices` to accept environment variables, ensuring that device discovery respects the user's settings.	2025-08-21 15:50:37 +05:30
Dinh Long Nguyen	32a2ca95b6	feat: gguf file size + hash validation (#5266 ) (#6259 ) * feat: gguf file size + hash validation * fix tests fe * update cargo tests * handle asyn download for both models and mmproj * move progress tracker to models * handle file download cancelled * add cancellation mid hash run	2025-08-21 16:17:58 +07:00
Louis	6b55812739	Merge pull request #6249 from menloresearch/feat/detect-cpu-arch-run-time feat: detect cpu arch in runtime	2025-08-21 11:51:13 +07:00
Louis	e6587844d0	Merge branch 'dev' into current-date-instruction	2025-08-21 11:41:30 +07:00
Louis	3a36353b02	fix: backend variant selection	2025-08-21 10:54:35 +07:00
Akarshan Biswas	906b87022d	chore: re enable reasoning_content in backend (#6228 ) * chore: re enable reasoning_content in backend * chore: handle reasoning_content * chore: refactor get reasoning content * chore: update PR review --------- Co-authored-by: Faisal Amir <urmauur@gmail.com>	2025-08-20 13:06:21 +05:30
Akarshan Biswas	0fc3dc6841	Fix: Validate GGUF files before loading (#6238 ) This commit adds a GGUF validation check for both the main model file and the `mmproj` file (if present) before they are loaded. This prevents the extension from crashing if an invalid GGUF file is provided. The `GgufMetadata` interface and `loadMetadata` function were removed as the `readGgufMetadata` is now invoked directly. The code has also been refactored to be more readable, with clearer variable names and more descriptive comments.	2025-08-20 10:31:19 +05:30
Dinh Long Nguyen	b0eec07a01	Add contributing section for jan (#6231 ) (#6232 ) * Add contributing section for jan * Update CONTRIBUTING.md Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> --------- Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>	2025-08-20 10:18:35 +07:00
Faisal Amir	5481ee9e35	Merge pull request #6134 from menloresearch/feat/attachment-ui feat: attachment UI	2025-08-20 10:04:32 +07:00
Kamal Fariz Mahyuddin	df27def9cb	Merge branch 'dev' into current-date-instruction	2025-08-19 14:40:08 -07:00
Akarshan Biswas	e761c439d7	feat: Pass API key via environment variable instead of command line argument (#6225 ) This change modifies how the API key is passed to the llama-server process. Previously, it was sent as a command line argument (--api-key). This approach has been updated to pass the key via an environment variable (LLAMA_API_KEY). This improves security by preventing the API key from being visible in the process list (ps aux on Linux, Task Manager on Windows, etc.), where it could potentially be exposed to other users or processes on the same system. The commit also updates the Rust backend to read the API key from the environment variable instead of parsing it from the command line arguments.	2025-08-19 20:57:06 +05:30
Louis	df41bad465	fix: import	2025-08-19 22:16:24 +07:00
Louis	a210e2f13a	fix: timeout for completion request	2025-08-19 22:06:26 +07:00
Akarshan	cffbe1a77b	Increase timeout for fetch in llamacpp	2025-08-19 20:00:10 +07:00
Faisal Amir	fdc8e07f86	chore: update model setting include offload-mmproj	2025-08-19 20:00:08 +07:00

1 2 3 4 5 ...

837 Commits