* feat: implement conversation endpoint
* use conversation aware endpoint
* fetch message correctly
* preserve first message
* fix logout
* fix broadcast issue locally + auth not refreshing profile on other tabs+ clean up and sync messages
* add is dev tag
* feat: add getTokensCount method to compute token usage
Implemented a new async `getTokensCount` function in the LLaMA.cpp extension.
The method validates the model session, checks process health, applies the request template, and tokenizes the resulting prompt to return the token count. Includes detailed error handling for crashed models and API failures, enabling callers to assess token usage before sending completions.
* Fix: typos
* chore: update ui token usage
* chore: remove unused code
* feat: add image token handling for multimodal LlamaCPP models
Implemented support for counting image tokens when using vision-enabled models:
- Extended `SessionInfo` with optional `mmprojPath` to store the multimodal project file.
- Propagated `mmproj_path` from the Tauri plugin into the session info.
- Added import of `chatCompletionRequestMessage` and enhanced token calculation logic in the LlamaCPP extension:
- Detects image content in messages.
- Reads GGUF metadata from `mmprojPath` to compute accurate image token counts.
- Provides a fallback estimation if metadata reading fails.
- Returns the sum of text and image tokens.
- Introduced helper methods `calculateImageTokens` and `estimateImageTokensFallback`.
- Minor clean‑ups such as comment capitalization and debug logging.
* chore: update FE send params message include content type image_url
* fix mmproj path from session info and num tokens calculation
* fix: Correct image token estimation calculation in llamacpp extension
This commit addresses an inaccurate token count for images in the llama.cpp extension.
The previous logic incorrectly calculated the token count based on image patch size and dimensions. This has been replaced with a more precise method that uses the clip.vision.projection_dim value from the model metadata.
Additionally, unnecessary debug logging was removed, and a new log was added to show the mmproj metadata for improved visibility.
* fix per image calc
* fix: crash due to force unwrap
---------
Co-authored-by: Faisal Amir <urmauur@gmail.com>
Co-authored-by: Louis <louis@jan.ai>
* feat: Prompt progress when streaming
- BE changes:
- Add a `return_progress` flag to `chatCompletionRequest` and a corresponding `prompt_progress` payload in `chatCompletionChunk`. Introduce `chatCompletionPromptProgress` interface to capture cache, processed, time, and total token counts.
- Update the Llamacpp extension to always request progress data when streaming, enabling UI components to display real‑time generation progress and leverage llama.cpp’s built‑in progress reporting.
* Make return_progress optional
* chore: update ui prompt progress before streaming content
* chore: remove log
* chore: remove progress when percentage >= 100
* chore: set timeout prompt progress
* chore: move prompt progress outside streaming content
* fix: tests
---------
Co-authored-by: Faisal Amir <urmauur@gmail.com>
Co-authored-by: Louis <louis@jan.ai>
* fix: support load model configurations
* chore: remove log
* chore: sampling params add from send completion
* chore: remove comment
* chore: remove comment on predefined file
* chore: update test model service
* refactor: move thinking toggle to runtime settings for per-message control
Replaces the static `reasoning_budget` config with a dynamic `enable_thinking` flag under `chat_template_kwargs`, allowing models like Jan-nano and Qwen3 to enable/disable thinking behavior at runtime, even mid-conversation.
Requires UI update
* remove engine argument
- Changed `pid` field in `SessionInfo` from `string` to `number`/`i32` in TypeScript and Rust.
- Updated `activeSessions` map key from `string` to `number` to align with new PID type.
- Adjusted process monitoring logic to correctly handle numeric PIDs.
- Removed fallback UUID-based PID generation in favor of numeric fallback (-1).
- Added PID cleanup logic in `is_process_running` when the process is no longer alive.
- Bumped application version from 0.5.16 to 0.6.900 in `tauri.conf.json`.
This commit introduces embedding functionality to the llamacpp extension. It allows users to generate embeddings for text inputs using the 'sentence-transformer-mini' model. The changes include:
- Adding a new `embed` method to the `llamacpp_extension` class.
- Implementing model loading and API interaction for embeddings.
- Handling potential errors during API requests.
- Adding necessary types for embedding responses and data.
- The load method now accepts a boolean parameter to determine if it should load embedding model.
The changes include:
- Renaming interfaces (sessionInfo -> SessionInfo, unloadResult -> UnloadResult) for consistency
- Adding getLoadedModels() method to retrieve active model IDs
- Updating variable names from modelId to model_id for alignment
- Updating cleanup paths to use XDG-standard locations
- Improving type consistency across extension implementation
Add comprehensive sampling parameters for fine-grained control over AI output generation, including dynamic temperature, Mirostat sampling, repetition penalties, and advanced prompt handling. These parameters enable more precise tuning of model behavior and output quality.
The changes standardize identifier names across the codebase for clarity:
- Replaced `sessionId` with `pid` to reflect process ID usage
- Changed `modelName` to `modelId` for consistency with identifier naming
- Renamed `api_key` to `apiKey` for camelCase consistency
- Updated corresponding methods to use these new identifiers
- Improved type safety and readability by aligning variable names with their semantic meaning
- Changed load method to accept modelId instead of loadOptions for better clarity and simplicity
- Renamed engineBasePath parameter to backendPath for consistency with the backend's directory structure
- Added getRandomPort method to ensure unique ports for each session to prevent conflicts
- Refactored configuration and model loading logic to improve maintainability and reduce redundancy
The `ImportOptions` interface was updated to include `modelPath` and `mmprojPath`. These options are required for importing models and multi-modal projects.
The `loadOptions` interface in `AIEngine.ts` now includes an optional `mmprojPath` property. This allows users to provide a path to their MMProject file when loading a model, which is required for certain model types. The `llamacpp-extension/src/index.ts` has been updated to pass this option to the llamacpp server if provided.
This commit introduces API key generation for the Llama.cpp extension. The API key is now generated on the server side using HMAC-SHA256 and a secret key to ensure security and uniqueness. The frontend now passes the model ID and API secret to the server to generate the key. This addresses the requirement for secure model access and authorization.
* add pull and abortPull
* add model import (download only)
* write model.yaml. support local model import
* remove cortex-related command
* add TODO
* remove cortex-related command