Nicholai/jan - jan - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Louis	a210e2f13a	fix: timeout for completion request	2025-08-19 22:06:26 +07:00
Akarshan	cffbe1a77b	Increase timeout for fetch in llamacpp	2025-08-19 20:00:10 +07:00
Faisal Amir	fdc8e07f86	chore: update model setting include offload-mmproj	2025-08-19 20:00:08 +07:00
Akarshan	9afeb5e514	feat: Add offload_mmproj option and validation This commit introduces a new configuration option offload_mmproj to the llamacpp extension. The offload_mmproj setting allows users to control whether the multimodal projector model is offloaded to the GPU. By default, it's offloaded for better performance. If set to false, the projector model will remain on the CPU, which can be useful in low GPU memory scenarios, though image processing might take longer. Additionally, this commit adds validate_mmproj_path to ensure the provided --mmproj path is valid and accessible, preventing issues during model loading. This change also refactors some invoke calls for improved readability.	2025-08-19 19:51:29 +07:00
Louis	55390de070	Merge pull request #6222 from menloresearch/feat/model-tool-use-detection feat: #5917 - model tool use capability should be auto detected	2025-08-19 13:55:08 +07:00
Louis	bfe671d7b4	feat: #5917 - model tool use capability should be auto detected	2025-08-19 09:51:36 +07:00
Akarshan Biswas	5ad3d282af	fix: re-enable Vulkan backend in integrated GPUs with enough memory (#6215 )	2025-08-18 17:31:01 +05:30
Dinh Long Nguyen	e1c8d98bf2	Backend Architecture Refactoring (#6094 ) (#6162 ) * add llamacpp plugin * Refactor llamacpp plugin * add utils plugin * remove utils folder * add hardware implementation * add utils folder + move utils function * organize cargo files * refactor utils src * refactor util * apply fmt * fmt * Update gguf + reformat * add permission for gguf commands * fix cargo test windows * revert yarn lock * remove cargo.lock for hardware plugin * ignore cargo.lock file * Fix hardware invoke + refactor hardware + refactor tests, constants * use api wrapper in extension to invoke hardware call + api wrapper build integration * add newline at EOF (per Akarshan) * add vi mock for getSystemInfo	2025-08-15 08:59:01 +07:00
Faisal Amir	a66d83c598	Merge pull request #6172 from menloresearch/fix/model-id-special-char fix: handle modelId special char	2025-08-14 12:33:58 +07:00
Akarshan Biswas	f4661912b0	feat: Add GGUF metadata reading functionality (#6120 ) * feat: Add GGUF metadata reading functionality This commit introduces a new Tauri command and a corresponding function to read metadata from GGUF model files. The new read_gguf_metadata command in the Rust backend uses the byteorder crate to parse the GGUF file format and extract key metadata. This information, including the file's version, tensor count, and a key-value map of other metadata, is then made available to the TypeScript frontend. This functionality is a foundational step toward providing users with more detailed information about their loaded models directly within the application. This will be refactored later. fixes: #6001 * loadMetadata() should return * Properly throw eror to FE * Use BufReader to improve performance	2025-08-13 22:54:20 +05:30
Akarshan Biswas	02ded9b545	fix: Improve error message for invalid version/backend format (#6149 ) * fix: Improve error message for invalid version/backend format This commit changes the error message displayed when the `version_backend` configuration is invalid. The new message is more user-friendly and suggests a simple solution, such as restarting the application, which is more helpful to the user than the previous technical error message. * fix typo	2025-08-12 21:38:22 +05:30
Akarshan Biswas	0cfc745954	feat: Introduce structured error handling for llamacpp extension (#6087 ) * feat: Introduce structured error handling for llamacpp extension This commit introduces a structured error handling system for the `llamacpp` extension. Instead of returning simple string errors, we now use a custom `LlamacppError` struct with a specific `ErrorCode` enum. This allows the frontend to display more user-friendly and actionable error messages based on the code, rather than raw debug logs. The changes include: - A new `ErrorCode` enum to categorize errors (e.g., `OutOfMemory`, `ModelArchNotSupported`, `BinaryNotFound`). - A `LlamacppError` struct to encapsulate the code, a user-facing message, and optional detailed logs. - A static method `from_stderr` that intelligently parses llama.cpp's standard error output to identify and map common issues like Out of Memory errors to a specific error code. - Refactored `ServerError` enum to wrap the new `LlamacppError` and provide a consistent serialization format for the Tauri frontend. - Updated all relevant functions (`load_llama_model`, `get_devices`) to return the new structured error type, ensuring a more robust and predictable error flow. - A reduced timeout for model loading from 300 to 180 seconds. This work lays the groundwork for a more intuitive and helpful user experience, as the application can now provide clear guidance to users when a model fails to load. * Update src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * Update src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * chore: update FE handle error object from extension * chore: fix property type --------- Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> Co-authored-by: Faisal Amir <urmauur@gmail.com>	2025-08-07 23:28:25 +05:30
Akarshan Biswas	6a699d8004	refactor: move session management & port allocation to backend (#6083 ) * refactor: move session management & port allocation to backend - Remove the in‑process `activeSessions` map and its cleanup logic from the TypeScript side. - Introduce new Tauri commands in Rust: - `get_random_port` – picks an unused port using a seeded RNG and checks availability. - `find_session_by_model` – returns the `SessionInfo` for a given model ID. - `get_loaded_models` – returns a list of currently loaded model IDs. - Update the extension’s TypeScript code to use these commands via `invoke`: - `findSessionByModel`, `load`, `unload`, `chat`, `getLoadedModels`, and `embed` now operate asynchronously and query the backend. - Remove the old `is_port_available` command and the custom port‑checking loop. - Simplify `onUnload` – session termination is now handled by the backend. - Drop unused helpers (`sleep`, `waitForModelLoad`) and related port‑availability code. - Add missing Rust imports (`rand::{StdRng,Rng,SeedableRng}`, `HashSet`) and improve error handling. - Register the new commands in `src-tauri/src/lib.rs` (replace `is_port_available` with the three new commands). This refactor centralises session state and port allocation in the Rust backend, eliminates duplicated logic, and resolves race conditions around model loading and session cleanup. * Use String(e) for error Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> --------- Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>	2025-08-07 13:06:21 +05:30
Akarshan Biswas	1f1605bdf9	feat: Add support for overriding tensor buffer type (#6062 ) * feat: Add support for overriding tensor buffer type This commit introduces a new configuration option, `override_tensor_buffer_t`, which allows users to specify a regex for matching tensor names to override their buffer type. This is an advanced setting primarily useful for optimizing the performance of large models, particularly Mixture of Experts (MoE) models. By overriding the tensor buffer type, users can keep critical parts of the model, like the attention layers, on the GPU while offloading other parts, such as the expert feed-forward networks, to the CPU. This can lead to significant speed improvements for massive models. Additionally, this change refines the error message to be more specific when a model fails to load. The previous message "Failed to load llama-server" has been updated to "Failed to load model" to be more accurate. * chore: update FE to suppoer override-tensor --------- Co-authored-by: Faisal Amir <urmauur@gmail.com>	2025-08-07 10:31:34 +05:30
Akarshan Biswas	8d147c1774	fix: Add conditional Vulkan support check for better GPU compatibility (#6066 ) Changes: - Introduce conditional Vulkan support check for discrete GPUs with 6GB+ VRAM fixes: #6009	2025-08-06 07:20:44 +05:30
Faisal Amir	5d001dfd5a	✨feat: jinja template customize per model instead provider level (#6053 )	2025-08-05 21:21:41 +07:00
Faisal Amir	99567a1102	✨feat: recommended label llamacpp setting (#6052 ) * ✨feat: recommended label llamacpp * chore: remove log	2025-08-05 13:55:33 +07:00
Louis	813c911487	Merge pull request #6046 from menloresearch/fix/support-missing-llamacpp-cuda-backends fix: support missing llamacpp cuda backends	2025-08-05 12:37:31 +07:00
Louis	4a4bc35cce	fix: should check for invalid backend to cover previous missing backend case	2025-08-05 11:41:02 +07:00
Akarshan Biswas	5e533bdedc	feat: Improve llama.cpp argument handling and add device parsing tests (#6041 ) * feat: Improve llama.cpp argument handling and add device parsing tests This commit refactors how arguments are passed to llama.cpp, specifically by only adding arguments when their values differ from their defaults. This reduces the verbosity of the command and prevents potential conflicts or errors when llama.cpp's default behavior aligns with the desired setting. Additionally, new tests have been added for parsing device output from llama.cpp, ensuring the accurate extraction of GPU information (ID, name, total memory, and free memory). This improves the robustness of device detection. The following changes were made: * Remove redundant `--ctx-size` argument: The `--ctx-size` argument is now only explicitly added if `cfg.ctx_size` is greater than 0. * Conditional argument adding for default values: * `--split-mode` is only added if `cfg.split_mode` is not empty and not 'layer'. * `--main-gpu` is only added if `cfg.main_gpu` is not undefined and not 0. * `--cache-type-k` is only added if `cfg.cache_type_k` is not 'f16'. * `--cache-type-v` is only added if `cfg.cache_type_v` is not 'f16' (when `flash_attn` is enabled) or not 'f32' (otherwise). This also corrects the `flash_attn` condition. * `--defrag-thold` is only added if `cfg.defrag_thold` is not 0.1. * `--rope-scaling` is only added if `cfg.rope_scaling` is not 'none'. * `--rope-scale` is only added if `cfg.rope_scale` is not 1. * `--rope-freq-base` is only added if `cfg.rope_freq_base` is not 0. * `--rope-freq-scale` is only added if `cfg.rope_freq_scale` is not 1. * Add `parse_device_output` tests: Comprehensive unit tests were added to `src-tauri/src/core/utils/extensions/inference_llamacpp_extension/server.rs` to validate the parsing of llama.cpp device output under various scenarios, including multiple devices, single devices, different backends (CUDA, Vulkan, SYCL), complex GPU names, and error conditions. * fixup cache_type_v comparision	2025-08-04 19:47:04 +05:30
Louis	45c2b02842	test: add tests for new changes	2025-08-04 16:01:04 +07:00
Louis	bf9315dbbe	fix: add missing cuda backend support	2025-08-04 15:54:21 +07:00
Faisal Amir	787c4ee073	fix: wrong desc setting cont_batching (#6034 )	2025-08-02 21:48:43 +07:00
Louis	9c0d09c487	refactor: clean up cortex (#6003 ) * refactor: clean up cortex * chore: clean up * refactor: clean up	2025-07-31 21:58:12 +07:00
Louis	7a3d9d765c	fix: failed provider models list due to broken cortex import (#5983 )	2025-07-30 17:37:44 +07:00
Akarshan Biswas	f61ce886a0	feat: Enhance port selection with availability check (#5966 ) This change improves the robustness of the llama.cpp extension's server port selection. Previously, the `getRandomPort()` method only checked for ports already in use by active sessions, which could lead to model load failures if the chosen port was occupied by another external process. This change introduces a new Tauri command, `is_port_available`, which performs a system-level check to ensure the randomly selected port is truly free before attempting to start the llama-server. It also adds a retry mechanism with a maximum number of attempts (20,000) to find an available port, throwing an error if no suitable port is found within the specified range after all attempts. This enhancement prevents port conflicts and improves the reliability and user experience of the llama.cpp extension within Jan. Closes #5965	2025-07-29 18:01:52 +05:30
Akarshan Biswas	07421d7f53	fix: set autoUnload in onLoad() (#5956 ) The variable was not initialized resulted in always setting true when starting. This change fixes it.	2025-07-28 20:54:21 +05:30
Akarshan Biswas	fa896b3bf3	fix: correctly apply `auto_unload` setting from config (#5953 ) Previously, the `autoUnload` flag was not being updated when set via config, causing models to be auto-unloaded regardless of the intended behavior. This patch ensures the setting is respected at runtime.	2025-07-28 19:17:29 +05:30
Akarshan Biswas	432c942330	fix: Prevent race condition with auto-unload during rapid model loading (#5947 ) This commit addresses a race condition where, with "Auto-Unload Old Models" enabled, rapidly attempting to load multiple models could result in more than one model being loaded simultaneously. Previously, the unloading logic did not account for models that were still in the process of loading when a new load operation was initiated. This allowed new models to start loading before the previous ones had fully completed their unload cycle. To resolve this: - A `loadingModels` map has been introduced to track promises for models currently in the loading state. - The `load` method now checks if a model is already being loaded and, if so, returns the existing promise, preventing duplicate load operations for the same model. - The `performLoad` method (which encapsulates the actual loading logic) now ensures that when `autoUnload` is active, it waits for any other models that are concurrently loading to finish before proceeding to unload all currently loaded models. This guarantees that the auto-unload mechanism properly unloads all models, including those initiated in quick succession, thereby preventing the race condition. This fixes the issue where clicking the start button very fast on multiple models would bypass the auto-unload functionality.	2025-07-28 12:59:48 +05:30
Louis	1fc37a9349	fix: migrate app settings to the new version (#5936 ) * fix: migrate app settings to the new version * fix: edge cases * fix: migrate HF import model on Windows * fix hardware page broken after downgraded * test: correct test * fix: backward compatible hardware info	2025-07-27 21:13:05 +07:00
Akarshan Biswas	c9b44eec52	fix: Remove sInfo from activeSessions before unloading (#5938 ) This commit addresses a potential race condition that could lead to "connection errors" when unloading a llamacpp model. The issue arose because the `activeSessions` map still has the session info of the model during unload. This could lead to "connection errors" when the backend is taking time to unload while there is an ongoing request to the model. The fix involves: 1. Deleting the `pid` from `activeSessions` before calling backend's unload: This ensures that the model is cleared from the map before we start unloading. 2. Failure handling: If somehow the backend fails to unload, the session info for that model is added back to prevent any race conditions. This commit improves the robustness and reliability of the unloading process by preventing potential conflicts.	2025-07-27 14:37:34 +05:30
Akarshan Biswas	8ec4a36826	fix: Frontend updates when llama.cpp backend auto-downloads (#5926 )	2025-07-26 08:48:29 +07:00
Akarshan Biswas	3982ed4c6f	fix: Allow N-GPU Layers (NGL) to be set to 0 in llama.cpp (#5907 ) * fix: Allow N-GPU Layers (NGL) to be set to 0 in llama.cpp The `n_gpu_layers` (NGL) setting in the llama.cpp extension was incorrectly preventing users from disabling GPU layers by automatically defaulting to 100 when set to 0. This was caused by a condition that only pushed `cfg.n_gpu_layers` if it was greater than 0 (`cfg.n_gpu_layers > 0`). This commit updates the condition to `cfg.n_gpu_layers >= 0`, allowing 0 to be a valid and accepted value for NGL. This ensures that users can effectively disable GPU offloading when desired. * fix: default ngl --------- Co-authored-by: Louis <louis@jan.ai>	2025-07-25 16:24:53 +05:30
Akarshan Biswas	4d4cf896af	fix: Persist 'Auto-Unload Old Models' setting in llama.cpp (#5906 ) The 'Auto-Unload Old Models' setting in the llama.cpp extension failed to persist due to a typo in its key name within `settings.json`. The key was incorrectly `auto_unload_models` instead of `auto_unload`. This commit corrects the key name to `auto_unload`, ensuring that user-configured changes to this setting are properly saved, retrieved, and persist across application restarts. This resolves the issue where the setting would change and remain to its previous value after being changed.	2025-07-25 11:03:15 +05:30
Akarshan Biswas	a1af70f7a9	feat: Enhance Llama.cpp backend management with persistence (#5886 ) * feat: Enhance Llama.cpp backend management with persistence This commit introduces significant improvements to how the Llama.cpp extension manages and updates its backend installations, focusing on user preference persistence and smarter auto-updates. Key changes include: * Persistent Backend Type Preference: The extension now stores the user's preferred backend type (e.g., `cuda`, `cpu`, `metal`) in `localStorage`. This ensures that even after updates or restarts, the system attempts to use the user's previously selected backend type, if available. * Intelligent Auto-Update: The auto-update mechanism has been refined to prioritize updating to the *latest version of the currently selected backend type*** rather than always defaulting to the "best available" backend (which might change). This respects user choice while keeping the chosen backend type up-to-date. * Improved Initial Installation/Configuration: For fresh installations or cases where the `version_backend` setting is invalid, the system now intelligently determines and installs the best available backend, then persists its type. * Refined Old Backend Cleanup: The `removeOldBackends` function has been renamed to `removeOldBackend` and modified to specifically clean up older versions of the currently selected backend type, preventing the accumulation of unnecessary files while preserving other backend types the user might switch to. * Robust Local Storage Handling: New private methods (`getStoredBackendType`, `setStoredBackendType`, `clearStoredBackendType`) are introduced to safely interact with `localStorage`, including error handling for potential `localStorage` access issues. * Version Filtering Utility: A new utility `findLatestVersionForBackend` helps in identifying the latest available version for a specific backend type from a list of supported backends. These changes provide a more stable, user-friendly, and maintainable backend management experience for the Llama.cpp extension. Fixes: #5883 * fix: cortex models migration should be done once * feat: Optimize Llama.cpp backend preference storage and UI updates This commit refines the Llama.cpp extension's backend management by: * Optimizing `localStorage` Writes: The system now only writes the backend type preference to `localStorage` if the new value is different from the currently stored one. This reduces unnecessary `localStorage` operations. * Ensuring UI Consistency on Initial Setup: When a fresh installation or an invalid backend configuration is detected, the UI settings are now explicitly updated to reflect the newly determined `effectiveBackendString`, ensuring the displayed setting matches the active configuration. These changes improve performance by reducing redundant storage operations and enhance user experience by maintaining UI synchronization with the backend state. * Revert "fix: provider settings should be refreshed on page load (#5887)" This reverts commit ce6af62c7df4a7e7ea8c0896f307309d6bf38771. * fix: add loader version backend llamacpp * fix: wrong key name * fix: model setting issues * fix: virtual dom hub * chore: cleanup * chore: hide device ofload setting --------- Co-authored-by: Louis <louis@jan.ai> Co-authored-by: Faisal Amir <urmauur@gmail.com>	2025-07-24 18:33:35 +07:00
Faisal Amir	399671488c	fix: gpu detected from backend version (#5882 ) * fix: gpu detected from backend version * chore: remove readonly props from dynamic field	2025-07-24 10:45:48 +07:00
Akarshan Biswas	1d0bb53f2a	feat: add support for querying available backend devices (#5877 ) * feat: add support for querying available backend devices This change introduces a new `get_devices` method to the `llamacpp_extension` engine that allows the frontend to query and display a list of available devices (e.g., Vulkan, CUDA, SYCL) from the compiled `llama-server` binary. * Added `DeviceList` interface to represent GPU/device metadata. * Implemented `getDevices(): Promise<DeviceList[]>` method. * Splits `version/backend`, ensures backend is ready. * Invokes the new Tauri command `get_devices`. * Introduced a new `get_devices` Tauri command. * Parses `llama-server --list-devices` output to extract available devices with memory info. * Introduced `DeviceInfo` struct (`id`, `name`, `mem`, `free`) and exposed it via serialization. * Robust parsing logic using string processing (non-regex) to locate memory stats. * Registered the new command in the `tauri::Builder` in `lib.rs`. * Fixed logic to correctly parse multiple devices from the llama-server output. * Handles common failure modes: binary not found, malformed memory info, etc. This sets the foundation for device selection, memory-aware model loading, and improved diagnostics in Jan AI engine setup flows. * Update extensions/llamacpp-extension/src/index.ts Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> --------- Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>	2025-07-23 19:20:12 +05:30
Louis	af116dd7dc	fix: jan should have a general assistant instruction (#5872 ) * fix: default Jan assistant prompt * test: update tests	2025-07-23 13:55:20 +07:00
Faisal Amir	43b7eb6e18	🐛fix: remove sampling parameters from llamacpp extension (#5871 )	2025-07-23 12:13:42 +07:00
Louis	fe95031c6e	feat: migrate cortex models to llamacpp extension (#5838 ) * feat: migrate cortex models to new llama.cpp extension * test: add tests * clean: remove duplicated import	2025-07-22 23:35:08 +07:00
Akarshan Biswas	1eaec5e4f6	Fix: engine unable to find dlls on when running on Windows (#5863 ) * Fix: Windows llamacpp not picking up dlls from lib repo * Fix lib path on Windows * Add debug info about lib_path * Normalize lib_path for Windows * fix window lib path normalization * fix: missing cuda dll files on windows * throw backend setup errors to UI * Fix format * Update extensions/llamacpp-extension/src/index.ts Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * feat: add logger to llamacpp-extension * fix: platform check --------- Co-authored-by: Louis <louis@jan.ai> Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>	2025-07-22 20:05:24 +05:30
Akarshan Biswas	f59739d2b0	refactor: Improve Llama.cpp backend management and auto-update (#5845 ) * refactor: Improve Llama.cpp backend management and auto-update This commit refactors the Llama.cpp extension to enhance backend management and streamline the auto-update process. Key changes include: Refactored configureBackends: The logic for determining the best available backend and populating settings is now more modular, preventing duplicate executions. Dedicated Auto-update Handling: Introduced a handleAutoUpdate method to encapsulate the auto-update logic, including downloading the latest available backend and updating the internal configuration and settings. Robust Old Backend Cleanup: The removeOldBackends method is improved to ensure only the currently used backend version and type are kept, effectively managing disk space. A delay is added for Windows to prevent file conflicts during cleanup. Final Installation Check: A ensureFinalBackendInstallation method is added to guarantee the selected backend is installed, acting as a final safeguard after auto-update or if auto-update is disabled. Minor Fixes: Added console.log for save_path during decompression for better debugging. Ensured the output directory exists before decompression in the Rust backend. Removed extraneous console log for session info. Updated Cargo.toml and tauri.conf.json versions. These changes lead to a more reliable and efficient Llama.cpp backend experience within the application, particularly for users with auto-update enabled. * fix isBackendInstalled parameters * Address bot's comments * Address bot comments of using try finally block	2025-07-22 14:35:34 +05:30
Akarshan Biswas	81d6ed3785	feat: support per-model overrides in llama.cpp load() (#5820 ) * feat: support per-model overrides in llama.cpp load() Extend the `load()` method in the llama.cpp extension to accept optional `overrideSettings`, allowing fine-grained per-model configuration. This enables users to override provider-level settings such as `ctx_size`, `chat_template`, `n_gpu_layers`, etc., when loading a specific model. Fixes: #5818 (Feature Request - Jan v0.6.6) Use cases enabled: - Different context sizes per model (e.g., 4K vs 32K) - Model-specific chat templates (ChatML, Alpaca, etc.) - Performance tuning (threads, GPU layers) - Better memory management per deployment Maintains full backward compatibility with existing provider config. * swap overrideSettings and isEmbedding argument	2025-07-21 08:59:50 +05:30
Louis	bc4fe52f8d	fix: llama.cpp integration model load and chat experience (#5823 ) * fix: stop generating should not stop running models * fix: ensure backend ready before loading model * fix: backend setting should not block onLoad	2025-07-21 09:29:26 +07:00
Louis	19cb1c96e0	fix: llama.cpp backend download on windows (#5813 ) * fix: llama.cpp backend download on windows * test: add missing cases * clean: linter * fix: build	2025-07-20 16:58:09 +07:00
Akarshan Biswas	8f1a36c8e3	fix: Improve stream error handling and parsing (#5807 ) * fix: Enhance stream error handling and parsing This commit improves the robustness of stream processing in the llamacpp-extension. - Adds explicit handling for 'error:' prefixed lines in the stream, parsing the contained JSON error and throwing an appropriate JavaScript Error. - Centralizes JSON parsing of 'data:' and 'error:' lines, ensuring consistent error propagation by re-throwing parsing exceptions. - Ensures the async iterator terminates correctly upon encountering stream errors or malformed JSON. * Address bot comments and cleanup	2025-07-18 18:36:33 +05:30
Akarshan Biswas	bcb60378c0	fix: Add --reasoning-format none to support rendering of reasoning content (#5803 )	2025-07-18 08:22:37 +05:30
Louis	8ca507c01c	feat: proxy support for the new downloader (#5795 ) * feat: proxy support for the new downloader * test: remove outdated test * ci: clean up	2025-07-17 23:10:21 +07:00
Akarshan Biswas	92703bceb2	refactor: move thinking toggle to runtime settings for dynamic control (#5800 ) * refactor: move thinking toggle to runtime settings for per-message control Replaces the static `reasoning_budget` config with a dynamic `enable_thinking` flag under `chat_template_kwargs`, allowing models like Jan-nano and Qwen3 to enable/disable thinking behavior at runtime, even mid-conversation. Requires UI update * remove engine argument	2025-07-17 20:18:24 +05:30
Akarshan Biswas	b736d09168	fix: Prevent spamming /health endpoint and improve startup and resolve compiler warnings (#5784 ) * fix: Prevent spamming /health endpoint and improve startup and resolve compiler warnings This commit introduces a delay and improved logic to the /health endpoint checks in the llamacpp extension, preventing excessive requests during model loading. Additionally, it addresses several Rust compiler warnings by: - Commenting out an unused `handle_app_quit` function in `src/core/mcp.rs`. - Explicitly declaring `target_port`, `session_api_key`, and `buffered_body` as mutable in `src/core/server.rs`. - Commenting out unused `tokio` imports in `src/core/setup.rs`. - Enhancing the `load_llama_model` function in `src/core/utils/extensions/inference_llamacpp_extension/server.rs` to better monitor stdout/stderr for readiness and errors, and handle timeouts. - Commenting out an unused `std::path::Prefix` import and adjusting `normalize_path` in `src/core/utils/mod.rs`. - Updating the application version to 0.6.904 in `tauri.conf.json`. * fix grammar! Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * fix grammar 2 Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * reimport prefix but only on Windows * remove instead of commenting * remove redundant check * sync app version in cargo.toml with tauri.conf --------- Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>	2025-07-16 18:18:11 +05:30

1 2 3 4 5 ...

789 Commits