Implement a comprehensive artifacts system that allows users to create, edit,
and manage persistent documents alongside conversations, inspired by Claude's
artifacts feature.
Key Features:
- AI model integration with system prompts teaching artifact usage
- Inline preview cards in chat messages with collapsible previews
- On-demand side panel overlay (replaces split view)
- Preview/Code toggle for rendered markdown vs raw content
- Artifacts sidebar for managing multiple artifacts per thread
- Monaco editor integration for code artifacts
- Autosave with debounced writes and conflict detection
- Diff preview system for proposed updates
- Keyboard shortcuts and quick switcher
- Export/import functionality
Backend (Rust):
- New artifacts module in src-tauri/src/core/artifacts/
- 12 Tauri commands for CRUD, proposals, and export/import
- Atomic file writes with SHA256 hashing
- Sharded history storage with pruning policy
- Path traversal validation and UTF-8 enforcement
Frontend (TypeScript/React):
- ArtifactSidePanel: Claude-style overlay panel from right
- InlineArtifactCard: Preview cards embedded in chat
- ArtifactsSidebar: Floating list for artifact switching
- Enhanced ArtifactActionMessage: Parses AI metadata from content
- useArtifacts: Zustand store with autosave and conflict resolution
Types:
- Extended ThreadMessage.metadata with ArtifactAction
- ProposedContentRef supports inline and temp storage
- MIME-like content_type values for extensibility
Platform:
- Fixed platform detection to check window.__TAURI__
- Web service stubs for browser mode compatibility
- Updated assistant system prompts in both extension and web-app
This implements the complete workflow studied from Claude:
1. AI only creates artifacts when explicitly requested
2. Inline cards appear in chat with preview buttons
3. Side panel opens on demand, not automatic split
4. Users can toggle Preview/Code views and edit content
5. Autosave and version management prevent data loss
This commit introduces a new field, `is_embedding`, to the `SessionInfo` structure to clearly mark sessions running dedicated embedding models.
Key changes:
- Adds `is_embedding` to the `SessionInfo` interface in `AIEngine.ts` and the Rust backend.
- Updates the `loadLlamaModel` command signatures to pass this new flag.
- Modifies the llama.cpp extension's **auto-unload logic** to explicitly **filter out** and **not unload** any currently loaded embedding models when a new text generation model is loaded. This is a critical performance fix to prevent the embedding model (e.g., used for RAG) from being repeatedly reloaded.
Also includes minor code style cleanup/reformatting in `jan-provider-web/provider.ts` for improved readability.
* feat: Adjust RAM/VRAM calculation for unified memory systems
This commit refactors the logic for calculating **total RAM** and **total VRAM** in `is_model_supported` and `plan_model_load` commands, specifically targeting systems with **unified memory** (like modern macOS devices where the GPU list may be empty).
The changes are as follows:
* **Total RAM Calculation:** If no GPUs are detected (`sys_info.gpus.is_empty()` is true), **total RAM** is now set to $0$. This avoids confusing total system memory with dedicated GPU memory when planning model placement.
* **Total VRAM Calculation:** If no GPUs are detected, **total VRAM** is still calculated as the system's **total memory (RAM)**, as this shared memory acts as VRAM on unified memory architectures.
This adjustment improves the accuracy of memory availability checks and model planning on unified memory systems.
* fix: total usable memory in case there is no system vram reported
* chore: temporarily change to self-hosted runner mac
* ci: revert back to github hosted runner macos
---------
Co-authored-by: Louis <louis@jan.ai>
Co-authored-by: Minh141120 <minh.itptit@gmail.com>
The KV cache size calculation in estimate_kv_cache_internal now includes a fallback mechanism for models that do not explicitly define key_length and value_length in the GGUF metadata.
If these attention keys are missing, the head dimension (and thus key/value length) is calculated using the formula embedding_length / total_heads. This improves robustness and compatibility with GGUF models that don't have the proper keys in metadata.
Also adds logging of the full model metadata for easier debugging of the estimation process.