reorganised content and new sections
This commit is contained in:
parent
94ca9ba902
commit
0793ce47e2
41
website/src/content/docs/browser/index.mdx
Normal file
41
website/src/content/docs/browser/index.mdx
Normal file
@ -0,0 +1,41 @@
|
|||||||
|
---
|
||||||
|
title: Jan Browser Extension
|
||||||
|
description: Bring your favorite AI models to any website with Jan's browser extension.
|
||||||
|
keywords:
|
||||||
|
[
|
||||||
|
Jan Browser Extension,
|
||||||
|
Jan AI,
|
||||||
|
Browser AI,
|
||||||
|
Chrome extension,
|
||||||
|
Firefox addon,
|
||||||
|
local AI,
|
||||||
|
ChatGPT alternative
|
||||||
|
]
|
||||||
|
banner:
|
||||||
|
content: 'Coming in September 2025. Currently testing it with selected users and internally. 🤓'
|
||||||
|
---
|
||||||
|
|
||||||
|
import { Aside, Card, CardGrid } from '@astrojs/starlight/components';
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## Your AI Models, Anywhere on the Web
|
||||||
|
|
||||||
|
The Jan Browser Extension brings AI assistance directly to your browsing experience.
|
||||||
|
Connect to your local Jan installation or any remote AI provider to get contextual help
|
||||||
|
on any website without switching tabs.
|
||||||
|
|
||||||
|
<Aside type="note">
|
||||||
|
**Jan Browser Extension is not yet available.** We are working hard to bring you seamless
|
||||||
|
AI integration across all your web browsing.
|
||||||
|
</Aside>
|
||||||
|
|
||||||
|
Access your preferred models without leaving your current page. Whether you're using local
|
||||||
|
Jan models or remote providers, get instant AI assistance while reading, writing, or researching
|
||||||
|
online.
|
||||||
|
|
||||||
|
### Core Features Planned:
|
||||||
|
- **Universal Access**: Use any Jan-compatible model from any website
|
||||||
|
- **Context Integration**: Highlight text and get AI assistance instantly
|
||||||
|
- **Privacy Options**: Choose between local processing or remote providers
|
||||||
|
- **Seamless Experience**: No tab switching or workflow interruption required
|
||||||
@ -17,11 +17,13 @@ keywords:
|
|||||||
large language model,
|
large language model,
|
||||||
LLM,
|
LLM,
|
||||||
]
|
]
|
||||||
|
banner:
|
||||||
|
content: |
|
||||||
|
We just launched something cool! 👋Jan now <a href="./jan/multi-modal">supports image 🖼️ attachments</a> 🎉
|
||||||
---
|
---
|
||||||
|
|
||||||
import { Aside } from '@astrojs/starlight/components';
|
import { Aside } from '@astrojs/starlight/components';
|
||||||
|
|
||||||
# Jan
|
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
|||||||
288
website/src/content/docs/jan/custom-provider.mdx
Normal file
288
website/src/content/docs/jan/custom-provider.mdx
Normal file
@ -0,0 +1,288 @@
|
|||||||
|
---
|
||||||
|
title: Custom Providers
|
||||||
|
description: Connect Jan to any OpenAI-compatible AI service, from major cloud providers to local inference servers.
|
||||||
|
keywords:
|
||||||
|
[
|
||||||
|
Jan,
|
||||||
|
custom providers,
|
||||||
|
OpenAI API,
|
||||||
|
Together AI,
|
||||||
|
vLLM,
|
||||||
|
LMStudio,
|
||||||
|
transformers,
|
||||||
|
SGLang,
|
||||||
|
API integration,
|
||||||
|
local AI,
|
||||||
|
cloud AI,
|
||||||
|
]
|
||||||
|
sidebar:
|
||||||
|
badge:
|
||||||
|
text: New
|
||||||
|
variant: tip
|
||||||
|
---
|
||||||
|
|
||||||
|
import { Aside } from '@astrojs/starlight/components';
|
||||||
|
|
||||||
|
Jan's custom provider system lets you connect to any OpenAI-compatible API service. Whether you're using cloud providers like Together AI, Fireworks, or Replicate, or running local inference servers like vLLM, LMStudio, or transformers, Jan can integrate with them seamlessly.
|
||||||
|
|
||||||
|
## What You Can Connect
|
||||||
|
|
||||||
|
**Cloud Providers:**
|
||||||
|
- Together AI, Fireworks, Replicate
|
||||||
|
- Perplexity, DeepInfra, Anyscale
|
||||||
|
- Any OpenAI-compatible API service
|
||||||
|
|
||||||
|
**Local Inference Servers:**
|
||||||
|
- vLLM, LMStudio, Ollama
|
||||||
|
- SGLang, transformers, text-generation-webui
|
||||||
|
- TensorRT-LLM, LocalAI
|
||||||
|
|
||||||
|
**Self-Hosted Solutions:**
|
||||||
|
- Your own API deployments
|
||||||
|
- Enterprise AI gateways
|
||||||
|
- Custom model endpoints
|
||||||
|
|
||||||
|
## Setup Process
|
||||||
|
|
||||||
|
### Add a New Provider
|
||||||
|
|
||||||
|
Navigate to **Settings > Model Providers** and click **Add Provider**.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Enter a name for your provider. We'll use Together AI as our example.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
### Get Your API Credentials
|
||||||
|
|
||||||
|
For cloud providers, you'll need an account and API key. Here's Together AI's dashboard showing your credits and API key location.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
<Aside type="caution">
|
||||||
|
Keep your API keys secure and never share them publicly. Most providers charge based on usage, so monitor your credits and spending.
|
||||||
|
</Aside>
|
||||||
|
|
||||||
|
### Configure the Provider
|
||||||
|
|
||||||
|
Back in Jan, fill in your provider's details:
|
||||||
|
|
||||||
|
**API Base URL:** The endpoint for your service (e.g., `https://api.together.xyz/`)
|
||||||
|
**API Key:** Your authentication token
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Common endpoints for popular services:
|
||||||
|
- **Together AI:** `https://api.together.xyz/`
|
||||||
|
- **Fireworks:** `https://api.fireworks.ai/`
|
||||||
|
- **Replicate:** `https://api.replicate.com/`
|
||||||
|
- **Local vLLM:** `http://localhost:8000/` (default)
|
||||||
|
- **LMStudio:** `http://localhost:1234/` (default)
|
||||||
|
|
||||||
|
### Add Model IDs
|
||||||
|
|
||||||
|
Click the `+` button to add specific models you want to access. Each provider offers different models with various capabilities.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
For Together AI, we're adding `Qwen/Qwen3-235B-A22B-Thinking-2507`, one of the most capable reasoning models available.
|
||||||
|
|
||||||
|
### Configure Model Features
|
||||||
|
|
||||||
|
After adding a model, click the pencil icon to enable additional features like tools or vision capabilities.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Enable tools if your model supports function calling. This allows integration with Jan's MCP system for web search, code execution, and more.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
### Start Using Your Custom Model
|
||||||
|
|
||||||
|
Open a new chat and select your custom model from the provider dropdown.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
If you enabled tools, click the tools icon to activate MCP integrations. Here we have Serper MCP enabled for web search capabilities.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
<Aside type="note">
|
||||||
|
Learn how to set up web search with our [Serper MCP tutorial](./mcp-examples/search/serper).
|
||||||
|
</Aside>
|
||||||
|
|
||||||
|
### Example in Action
|
||||||
|
|
||||||
|
Here's the Qwen model thinking through a complex query, searching the web, and providing detailed information about Sydney activities.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
**Prompt used:** "What is happening in Sydney, Australia this week? What fun activities could I attend?"
|
||||||
|
|
||||||
|
The model demonstrated reasoning, web search integration, and comprehensive response formatting—all through Jan's custom provider system.
|
||||||
|
|
||||||
|
## Provider-Specific Setup
|
||||||
|
|
||||||
|
### Together AI
|
||||||
|
- **Endpoint:** `https://api.together.xyz/`
|
||||||
|
- **Popular Models:** `meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo`, `Qwen/Qwen2.5-Coder-32B-Instruct`
|
||||||
|
- **Features:** Fast inference, competitive pricing, latest models
|
||||||
|
- **Best For:** Production applications, latest model access
|
||||||
|
|
||||||
|
### Fireworks AI
|
||||||
|
- **Endpoint:** `https://api.fireworks.ai/`
|
||||||
|
- **Popular Models:** `accounts/fireworks/models/llama-v3p1-405b-instruct`, `accounts/fireworks/models/qwen2p5-coder-32b-instruct`
|
||||||
|
- **Features:** Ultra-fast inference, function calling support
|
||||||
|
- **Best For:** Real-time applications, tool usage
|
||||||
|
|
||||||
|
### vLLM (Local)
|
||||||
|
- **Endpoint:** `http://localhost:8000/` (configurable)
|
||||||
|
- **Setup:** Install vLLM, run `vllm serve MODEL_NAME --api-key YOUR_KEY`
|
||||||
|
- **Models:** Any HuggingFace model compatible with vLLM
|
||||||
|
- **Best For:** Self-hosted deployments, custom models
|
||||||
|
|
||||||
|
### LMStudio (Local)
|
||||||
|
- **Endpoint:** `http://localhost:1234/` (default)
|
||||||
|
- **Setup:** Download LMStudio, load a model, start local server
|
||||||
|
- **Models:** GGUF models from HuggingFace
|
||||||
|
- **Best For:** Easy local setup, GUI management
|
||||||
|
|
||||||
|
### Ollama (Local)
|
||||||
|
- **Endpoint:** `http://localhost:11434/` (with OpenAI compatibility)
|
||||||
|
- **Setup:** Install Ollama, run `OLLAMA_HOST=0.0.0.0 ollama serve`
|
||||||
|
- **Models:** Ollama model library (llama3, qwen2.5, etc.)
|
||||||
|
- **Best For:** Simple local deployment, model management
|
||||||
|
|
||||||
|
## Example Prompts to Try
|
||||||
|
|
||||||
|
### Advanced Reasoning
|
||||||
|
```
|
||||||
|
I'm planning to start a sustainable urban garden on my apartment balcony. Consider my location (temperate climate), space constraints (4x6 feet), budget ($200), and goals (year-round fresh herbs and vegetables). Provide a detailed plan including plant selection, container setup, watering system, and seasonal rotation schedule.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Research and Analysis
|
||||||
|
```
|
||||||
|
Compare the environmental impact of electric vehicles vs hydrogen fuel cell vehicles in 2024. Include manufacturing emissions, energy sources, infrastructure requirements, and lifecycle costs. Provide specific data and cite recent studies.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Creative Problem Solving
|
||||||
|
```
|
||||||
|
Design a mobile app that helps people reduce food waste. Consider user psychology, practical constraints, monetization, and social impact. Include wireframes description, key features, and go-to-market strategy.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Technical Deep Dive
|
||||||
|
```
|
||||||
|
Explain how large language models use attention mechanisms to understand context. Start with the basics and build up to transformer architecture, including mathematical foundations and practical implications for different model sizes.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Planning and Strategy
|
||||||
|
```
|
||||||
|
I have 6 months to learn machine learning from scratch and land an ML engineering job. Create a week-by-week study plan including theory, practical projects, portfolio development, and job search strategy. Consider my background in software development.
|
||||||
|
```
|
||||||
|
|
||||||
|
## Advanced Configuration
|
||||||
|
|
||||||
|
### Authentication Methods
|
||||||
|
|
||||||
|
**API Key Header (Most Common):**
|
||||||
|
- Standard: `Authorization: Bearer YOUR_KEY`
|
||||||
|
- Custom: `X-API-Key: YOUR_KEY`
|
||||||
|
|
||||||
|
**Query Parameters:**
|
||||||
|
- Some services use `?api_key=YOUR_KEY`
|
||||||
|
|
||||||
|
**Custom Headers:**
|
||||||
|
- Enterprise gateways may require specific headers
|
||||||
|
|
||||||
|
### Request Customization
|
||||||
|
|
||||||
|
Most providers support OpenAI's standard parameters:
|
||||||
|
- `temperature`: Response creativity (0.0-1.0)
|
||||||
|
- `max_tokens`: Response length limit
|
||||||
|
- `top_p`: Token selection probability
|
||||||
|
- `frequency_penalty`: Repetition control
|
||||||
|
- `presence_penalty`: Topic diversity
|
||||||
|
|
||||||
|
### Model Naming Conventions
|
||||||
|
|
||||||
|
Different providers use various naming schemes:
|
||||||
|
- **HuggingFace:** `organization/model-name`
|
||||||
|
- **Together AI:** `meta-llama/Llama-2-70b-chat-hf`
|
||||||
|
- **Ollama:** `llama3:latest`
|
||||||
|
- **Local:** Often just the model name
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Connection Issues
|
||||||
|
- Verify the API endpoint URL is correct
|
||||||
|
- Check if the service is running (for local providers)
|
||||||
|
- Confirm network connectivity and firewall settings
|
||||||
|
|
||||||
|
### Authentication Failures
|
||||||
|
- Ensure API key is copied correctly (no extra spaces)
|
||||||
|
- Check if the key has necessary permissions
|
||||||
|
- Verify the authentication method matches provider requirements
|
||||||
|
|
||||||
|
### Model Not Found
|
||||||
|
- Confirm the model ID exists on the provider
|
||||||
|
- Check spelling and capitalization
|
||||||
|
- Some models require special access or approval
|
||||||
|
|
||||||
|
### Rate Limiting
|
||||||
|
- Most providers have usage limits
|
||||||
|
- Implement delays between requests if needed
|
||||||
|
- Consider upgrading to higher tier plans
|
||||||
|
|
||||||
|
### Performance Issues
|
||||||
|
- Local providers may need more powerful hardware
|
||||||
|
- Cloud providers vary in response times
|
||||||
|
- Check provider status pages for service issues
|
||||||
|
|
||||||
|
## Cost Management
|
||||||
|
|
||||||
|
### Cloud Provider Pricing
|
||||||
|
- Most charge per token (input + output)
|
||||||
|
- Prices vary significantly between models
|
||||||
|
- Monitor usage through provider dashboards
|
||||||
|
|
||||||
|
### Local Provider Costs
|
||||||
|
- Hardware requirements (RAM, GPU)
|
||||||
|
- Electricity consumption
|
||||||
|
- Initial setup and maintenance time
|
||||||
|
|
||||||
|
### Optimization Tips
|
||||||
|
- Use smaller models for simple tasks
|
||||||
|
- Implement caching for repeated queries
|
||||||
|
- Set appropriate max_tokens limits
|
||||||
|
- Monitor and track usage patterns
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
### Security
|
||||||
|
- Store API keys securely
|
||||||
|
- Use environment variables in production
|
||||||
|
- Rotate keys regularly
|
||||||
|
- Monitor for unauthorized usage
|
||||||
|
|
||||||
|
### Performance
|
||||||
|
- Choose models appropriate for your tasks
|
||||||
|
- Implement proper error handling
|
||||||
|
- Cache responses when possible
|
||||||
|
- Use streaming for long responses
|
||||||
|
|
||||||
|
### Reliability
|
||||||
|
- Have fallback providers configured
|
||||||
|
- Implement retry logic
|
||||||
|
- Monitor service availability
|
||||||
|
- Test regularly with different models
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
Once you have custom providers configured, explore advanced integrations:
|
||||||
|
- Combine with [MCP tools](./mcp-examples/search/serper) for enhanced capabilities
|
||||||
|
- Set up multiple providers for different use cases
|
||||||
|
- Create custom assistants with provider-specific models
|
||||||
|
- Build workflows that leverage different model strengths
|
||||||
|
|
||||||
|
Custom providers unlock Jan's full potential, letting you access cutting-edge models and maintain complete control over your AI infrastructure. Whether you prefer cloud convenience or local privacy, Jan adapts to your workflow.
|
||||||
@ -1,207 +0,0 @@
|
|||||||
---
|
|
||||||
title: Local AI Engine
|
|
||||||
description: Understand and configure Jan's local AI engine for running models on your hardware.
|
|
||||||
keywords:
|
|
||||||
[
|
|
||||||
Jan,
|
|
||||||
Customizable Intelligence, LLM,
|
|
||||||
local AI,
|
|
||||||
privacy focus,
|
|
||||||
free and open source,
|
|
||||||
private and offline,
|
|
||||||
conversational AI,
|
|
||||||
no-subscription fee,
|
|
||||||
large language models,
|
|
||||||
Llama CPP integration,
|
|
||||||
llama.cpp Engine,
|
|
||||||
Intel CPU,
|
|
||||||
AMD CPU,
|
|
||||||
NVIDIA GPU,
|
|
||||||
AMD GPU Radeon,
|
|
||||||
Apple Silicon,
|
|
||||||
Intel Arc GPU,
|
|
||||||
]
|
|
||||||
---
|
|
||||||
|
|
||||||
import { Aside, Tabs, TabItem } from '@astrojs/starlight/components';
|
|
||||||
|
|
||||||
## What is llama.cpp?
|
|
||||||
|
|
||||||
llama.cpp is the engine that runs AI models locally on your computer. Think of it as the software
|
|
||||||
that takes an AI model file and makes it work on your hardware - whether that's your CPU,
|
|
||||||
graphics card, or Apple's M-series chips.
|
|
||||||
|
|
||||||
Originally created by Georgi Gerganov, llama.cpp is designed to run large language models
|
|
||||||
efficiently on consumer hardware without requiring specialized AI accelerators or cloud connections.
|
|
||||||
|
|
||||||
## Why This Matters
|
|
||||||
|
|
||||||
- **Privacy**: Your conversations never leave your computer
|
|
||||||
- **Cost**: No monthly subscription fees or API costs
|
|
||||||
- **Speed**: No internet required once models are downloaded
|
|
||||||
- **Control**: Choose exactly which models to run and how they behave
|
|
||||||
|
|
||||||
## Accessing Engine Settings
|
|
||||||
|
|
||||||
Find llama.cpp settings at **Settings** > **Model Providers** > **Llama.cpp**:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
<Aside type="note">
|
|
||||||
These are advanced settings. You typically only need to adjust them if models aren't working properly or you want to optimize performance for your specific hardware.
|
|
||||||
</Aside>
|
|
||||||
|
|
||||||
## Engine Management
|
|
||||||
|
|
||||||
| Feature | What It Does | When You Need It |
|
|
||||||
|---------|-------------|------------------|
|
|
||||||
| **Engine Version** | Shows which version of llama.cpp you're running | Check compatibility with newer models |
|
|
||||||
| **Check Updates** | Downloads newer engine versions | When new models require updated engine |
|
|
||||||
| **Backend Selection** | Choose the version optimized for your hardware | After installing new graphics cards or when performance is poor |
|
|
||||||
|
|
||||||
## Hardware Backends
|
|
||||||
|
|
||||||
Jan offers different backend versions optimized for your specific hardware. Think of these as different "drivers" - each one is tuned for particular processors or graphics cards.
|
|
||||||
|
|
||||||
<Aside type="caution">
|
|
||||||
Using the wrong backend can make models run slowly or fail to load. Pick the one that matches your hardware.
|
|
||||||
</Aside>
|
|
||||||
|
|
||||||
<Tabs>
|
|
||||||
|
|
||||||
<TabItem label="Windows">
|
|
||||||
|
|
||||||
### NVIDIA Graphics Cards (Recommended for Speed)
|
|
||||||
Choose based on your CUDA version (check NVIDIA Control Panel):
|
|
||||||
|
|
||||||
**For CUDA 12.0:**
|
|
||||||
- `llama.cpp-avx2-cuda-12-0` (most common)
|
|
||||||
- `llama.cpp-avx512-cuda-12-0` (newer Intel/AMD CPUs)
|
|
||||||
|
|
||||||
**For CUDA 11.7:**
|
|
||||||
- `llama.cpp-avx2-cuda-11-7` (most common)
|
|
||||||
- `llama.cpp-avx512-cuda-11-7` (newer Intel/AMD CPUs)
|
|
||||||
|
|
||||||
### CPU Only (No Graphics Card Acceleration)
|
|
||||||
- `llama.cpp-avx2` (most modern CPUs)
|
|
||||||
- `llama.cpp-avx512` (newer Intel/AMD CPUs)
|
|
||||||
- `llama.cpp-avx` (older CPUs)
|
|
||||||
- `llama.cpp-noavx` (very old CPUs)
|
|
||||||
|
|
||||||
### Other Graphics Cards
|
|
||||||
- `llama.cpp-vulkan` (AMD, Intel Arc, some others)
|
|
||||||
|
|
||||||
<Aside type="note">
|
|
||||||
**Quick Test**: Start with `avx2-cuda-12-0` if you have an NVIDIA card, or `avx2` for CPU-only. If it doesn't work, try the `avx` variant.
|
|
||||||
</Aside>
|
|
||||||
|
|
||||||
</TabItem>
|
|
||||||
|
|
||||||
<TabItem label="Linux">
|
|
||||||
|
|
||||||
### NVIDIA Graphics Cards
|
|
||||||
Same CUDA options as Windows:
|
|
||||||
- `llama.cpp-avx2-cuda-12-0` (most common)
|
|
||||||
- `llama.cpp-avx2-cuda-11-7` (older drivers)
|
|
||||||
|
|
||||||
### CPU Only
|
|
||||||
- `llama.cpp-avx2` (most modern CPUs)
|
|
||||||
- `llama.cpp-avx512` (newer Intel/AMD CPUs)
|
|
||||||
- `llama.cpp-arm64` (ARM processors like Raspberry Pi)
|
|
||||||
|
|
||||||
### Other Graphics Cards
|
|
||||||
- `llama.cpp-vulkan` (AMD, Intel graphics)
|
|
||||||
|
|
||||||
</TabItem>
|
|
||||||
|
|
||||||
<TabItem label="MacOS">
|
|
||||||
|
|
||||||
### Apple Silicon (M1/M2/M3/M4)
|
|
||||||
- `llama.cpp-mac-arm64` (recommended)
|
|
||||||
|
|
||||||
### Intel Macs
|
|
||||||
- `llama.cpp-mac-amd64`
|
|
||||||
|
|
||||||
<Aside type="note">
|
|
||||||
Apple Silicon Macs automatically use the GPU through Metal - no additional setup needed.
|
|
||||||
</Aside>
|
|
||||||
|
|
||||||
</TabItem>
|
|
||||||
|
|
||||||
</Tabs>
|
|
||||||
|
|
||||||
## Performance Settings
|
|
||||||
|
|
||||||
These control how efficiently models run:
|
|
||||||
|
|
||||||
| Setting | What It Does | Recommended Value | Impact |
|
|
||||||
|---------|-------------|------------------|---------|
|
|
||||||
| **Continuous Batching** | Process multiple requests at once | Enabled | Faster when using multiple tools or having multiple conversations |
|
|
||||||
| **Parallel Operations** | How many requests to handle simultaneously | 4 | Higher = more multitasking, but uses more memory |
|
|
||||||
| **CPU Threads** | How many processor cores to use | Auto-detected | More threads can speed up CPU processing |
|
|
||||||
|
|
||||||
## Memory Settings
|
|
||||||
|
|
||||||
These control how models use your computer's memory:
|
|
||||||
|
|
||||||
| Setting | What It Does | Recommended Value | When to Change |
|
|
||||||
|---------|-------------|------------------|----------------|
|
|
||||||
| **Flash Attention** | More efficient memory usage | Enabled | Leave enabled unless you have problems |
|
|
||||||
| **Caching** | Remember recent conversations | Enabled | Speeds up follow-up questions |
|
|
||||||
| **KV Cache Type** | Memory precision trade-off | f16 | Change to q8_0 or q4_0 if running out of memory |
|
|
||||||
| **mmap** | Load models more efficiently | Enabled | Helps with large models |
|
|
||||||
| **Context Shift** | Handle very long conversations | Disabled | Enable for very long chats or multiple tool calls |
|
|
||||||
|
|
||||||
### KV Cache Types Explained
|
|
||||||
- **f16**: Most stable, uses more memory
|
|
||||||
- **q8_0**: Balanced memory usage and quality
|
|
||||||
- **q4_0**: Uses least memory, slight quality loss
|
|
||||||
|
|
||||||
## Troubleshooting Common Issues
|
|
||||||
|
|
||||||
**Models won't load:**
|
|
||||||
- Try a different backend (switch from CUDA to CPU or vice versa)
|
|
||||||
- Check if you have enough RAM/VRAM
|
|
||||||
- Update to latest engine version
|
|
||||||
|
|
||||||
**Very slow performance:**
|
|
||||||
- Make sure you're using GPU acceleration (CUDA/Metal/Vulkan backend)
|
|
||||||
- Increase GPU Layers in model settings
|
|
||||||
- Close other memory-intensive programs
|
|
||||||
|
|
||||||
**Out of memory errors:**
|
|
||||||
- Reduce Context Size in model settings
|
|
||||||
- Switch KV Cache Type to q8_0 or q4_0
|
|
||||||
- Try a smaller model variant
|
|
||||||
|
|
||||||
**Random crashes:**
|
|
||||||
- Switch to a more stable backend (try avx instead of avx2)
|
|
||||||
- Disable overclocking if you have it enabled
|
|
||||||
- Update graphics drivers
|
|
||||||
|
|
||||||
## Quick Setup Guide
|
|
||||||
|
|
||||||
**For most users:**
|
|
||||||
1. Use the default backend that Jan installs
|
|
||||||
2. Leave all performance settings at defaults
|
|
||||||
3. Only adjust if you experience problems
|
|
||||||
|
|
||||||
**If you have an NVIDIA graphics card:**
|
|
||||||
1. Download the appropriate CUDA backend
|
|
||||||
2. Make sure GPU Layers is set high in model settings
|
|
||||||
3. Enable Flash Attention
|
|
||||||
|
|
||||||
**If models are too slow:**
|
|
||||||
1. Check you're using GPU acceleration
|
|
||||||
2. Try enabling Continuous Batching
|
|
||||||
3. Close other applications using memory
|
|
||||||
|
|
||||||
**If running out of memory:**
|
|
||||||
1. Change KV Cache Type to q8_0
|
|
||||||
2. Reduce Context Size in model settings
|
|
||||||
3. Try a smaller model
|
|
||||||
|
|
||||||
<Aside type="note">
|
|
||||||
Most users can run Jan successfully without changing any of these settings. The defaults are chosen
|
|
||||||
to work well on typical hardware.
|
|
||||||
</Aside>
|
|
||||||
@ -174,8 +174,6 @@ Configuration for GPU support:
|
|||||||
|
|
||||||
<Tabs>
|
<Tabs>
|
||||||
<TabItem label="NVIDIA GPU">
|
<TabItem label="NVIDIA GPU">
|
||||||
<ol>
|
|
||||||
|
|
||||||
### Step 1: Verify Hardware & Install Dependencies
|
### Step 1: Verify Hardware & Install Dependencies
|
||||||
|
|
||||||
**1.1. Check GPU Detection**
|
**1.1. Check GPU Detection**
|
||||||
@ -221,8 +219,6 @@ Configuration for GPU support:
|
|||||||
<Aside type="note">
|
<Aside type="note">
|
||||||
CUDA offers better performance than Vulkan.
|
CUDA offers better performance than Vulkan.
|
||||||
</Aside>
|
</Aside>
|
||||||
|
|
||||||
</ol>
|
|
||||||
</TabItem>
|
</TabItem>
|
||||||
|
|
||||||
<TabItem label="AMD GPU">
|
<TabItem label="AMD GPU">
|
||||||
|
|||||||
@ -20,7 +20,6 @@ sidebar:
|
|||||||
|
|
||||||
import { Aside } from '@astrojs/starlight/components';
|
import { Aside } from '@astrojs/starlight/components';
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
## Why Jan Nano?
|
## Why Jan Nano?
|
||||||
|
|
||||||
|
|||||||
@ -5,7 +5,6 @@ description: Compact 1.7B model optimized for web search with tool calling
|
|||||||
|
|
||||||
import { Aside } from '@astrojs/starlight/components';
|
import { Aside } from '@astrojs/starlight/components';
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
@ -13,8 +12,6 @@ Lucy is a 1.7B parameter model built on Qwen3-1.7B, optimized for web search thr
|
|||||||
|
|
||||||
## Performance
|
## Performance
|
||||||
|
|
||||||
### SimpleQA Benchmark
|
|
||||||
|
|
||||||
Lucy achieves competitive performance on SimpleQA despite its small size:
|
Lucy achieves competitive performance on SimpleQA despite its small size:
|
||||||
|
|
||||||

|

|
||||||
|
|||||||
@ -1,43 +1,47 @@
|
|||||||
---
|
---
|
||||||
title: Managing Models
|
title: Models Overview
|
||||||
description: Manage your interaction with AI models locally.
|
description: Manage AI models in Jan - local and cloud options
|
||||||
keywords:
|
keywords:
|
||||||
[
|
[
|
||||||
Jan,
|
Jan,
|
||||||
Customizable Intelligence, LLM,
|
AI models,
|
||||||
local AI,
|
local models,
|
||||||
privacy focus,
|
cloud models,
|
||||||
free and open source,
|
GGUF,
|
||||||
private and offline,
|
Llama.cpp,
|
||||||
conversational AI,
|
model management,
|
||||||
no-subscription fee,
|
OpenAI,
|
||||||
large language models,
|
Anthropic,
|
||||||
threads,
|
model selection,
|
||||||
chat history,
|
hardware requirements,
|
||||||
thread history,
|
privacy,
|
||||||
]
|
]
|
||||||
---
|
---
|
||||||
|
|
||||||
import { Aside } from '@astrojs/starlight/components';
|
import { Aside } from '@astrojs/starlight/components';
|
||||||
|
|
||||||
This guide shows you how to add, customize, and delete models within Jan.
|
AI models power Jan's conversations. You can run models locally on your device for privacy, or connect to cloud providers for more power.
|
||||||
|
|
||||||
## Local Model
|
## Quick Start
|
||||||
|
|
||||||
Local models are managed through [Llama.cpp](https://github.com/ggerganov/llama.cpp), and these models are in a
|
**New to Jan?** Start with **Jan-v1** (4B) - it runs on most computers
|
||||||
format called GGUF. When you run them locally, they will use your computer's memory (RAM) and processing power, so
|
**Limited hardware?** Use cloud models with your API keys
|
||||||
please make sure that you download models that match the hardware specifications for your operating system:
|
**Privacy focused?** Download any local model - your data never leaves your device
|
||||||
|
|
||||||
|
## Local Models
|
||||||
|
|
||||||
|
Local models are managed through [Llama.cpp](https://github.com/ggerganov/llama.cpp), and these models are in a format called GGUF. When you run them locally, they will use your computer's memory (RAM) and processing power, so please make sure that you download models that match the hardware specifications for your operating system:
|
||||||
- [Mac](/docs/desktop/mac#compatibility)
|
- [Mac](/docs/desktop/mac#compatibility)
|
||||||
- [Windows](/docs/desktop/windows#compatibility)
|
- [Windows](/docs/desktop/windows#compatibility)
|
||||||
- [Linux](/docs/desktop/linux#compatibility).
|
- [Linux](/docs/desktop/linux#compatibility)
|
||||||
|
|
||||||
### Adding Models
|
### Adding Local Models
|
||||||
|
|
||||||
#### 1. Download from Jan Hub (Recommended)
|
#### 1. Download from Jan Hub (Recommended)
|
||||||
|
|
||||||
The easiest way to get started is using Jan's built-in model hub (which is connected to [HuggingFace's Model Hub](https://huggingface.co/models):
|
The easiest way to get started is using Jan's built-in model hub (connected to [HuggingFace's Model Hub](https://huggingface.co/models)):
|
||||||
1. Go to the **Hub** tab
|
1. Go to the **Hub** tab
|
||||||
2. Browse available models and click on any model to see details about it
|
2. Browse available models and click on any model to see details
|
||||||
3. Choose a model that fits your needs & hardware specifications
|
3. Choose a model that fits your needs & hardware specifications
|
||||||
4. Click **Download** on your chosen model
|
4. Click **Download** on your chosen model
|
||||||
|
|
||||||
@ -47,141 +51,140 @@ Jan will indicate if a model might be **Slow on your device** or **Not enough RA
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
#### 2. Import from Hugging Face
|
||||||
#### 2. Import from [Hugging Face](https://huggingface.co/)
|
|
||||||
|
|
||||||
You can download models with a direct link from Hugging Face:
|
You can download models with a direct link from Hugging Face:
|
||||||
|
|
||||||
**Note:** Some models require a Hugging Face Access Token. Enter your token in **Settings > Model Providers > Hugging Face** before importing.
|
**Note:** Some models require a Hugging Face Access Token. Enter your token in **Settings > Model Providers > Hugging Face** before importing.
|
||||||
|
|
||||||
1. Visit the [Hugging Face Models](https://huggingface.co/models) page.
|
1. Visit [Hugging Face Models](https://huggingface.co/models)
|
||||||
2. Find the model you want to use, make sure it is a GGUF file that fits in your computer.
|
2. Find a GGUF model that fits your computer
|
||||||
3. Copy the **model ID** (e.g., TheBloke/Mistral-7B-v0.1-GGUF)
|
3. Copy the **model ID** (e.g., TheBloke/Mistral-7B-v0.1-GGUF)
|
||||||
4. In Jan, paste the model ID to the **Search** bar in **Hub** page
|
4. In Jan, paste the model ID to the **Search** bar in **Hub** page
|
||||||
5. Select your preferred quantized version to download (if the option is available)
|
5. Select your preferred quantized version to download
|
||||||
|
|
||||||
**Copy the model ID.**
|
**Copy the model ID:**
|
||||||

|

|
||||||
|
|
||||||
**Paste it in Jan's Hub Search Bar.**
|
**Paste it in Jan's Hub Search Bar:**
|
||||||

|

|
||||||
|
|
||||||
#### 3. Import Local Files
|
#### 3. Import Local Files
|
||||||
|
|
||||||
If you already have one or many GGUF model files on your computer:
|
If you already have GGUF model files on your computer:
|
||||||
1. In Jan, go to **Settings > Model Providers > Llama.cpp**
|
1. Go to **Settings > Model Providers > Llama.cpp**
|
||||||
2. Click **Import** and select your GGUF file(s)
|
2. Click **Import** and select your GGUF file(s)
|
||||||
3. Choose how you want to import:
|
3. Choose how to import:
|
||||||
- **Link Files:** Creates symbolic links to your model files (saves space)
|
- **Link Files:** Creates symbolic links (saves space)
|
||||||
- **Duplicate:** Makes a copy of model files in Jan's directory
|
- **Duplicate:** Copies files to Jan's directory
|
||||||
4. Click **Import** to complete (check the [Jan Data Folder](./data-folder) section for more info)
|
4. Click **Import** to complete
|
||||||
|
|
||||||
<Aside type="caution">
|
|
||||||
You need to own your **model configurations**, use at your own risk. Misconfigurations may result in lower
|
|
||||||
quality or unexpected outputs. Learn about [model configurations here](./model-parameters).
|
|
||||||
</Aside>
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
|

|
||||||
|

|
||||||
|

|
||||||
|
|
||||||
#### 4. Manual Setup
|
#### 4. Manual Setup
|
||||||
|
|
||||||
For advanced users who want to add a specific model that is not available within the Jan **Hub**:
|
For advanced users who want to add models not available in Jan Hub:
|
||||||
|
|
||||||
##### Step 1: Create Model File
|
##### Step 1: Create Model File
|
||||||
|
|
||||||
1. Navigate to the [Jan Data Folder](./data-folder)
|
1. Navigate to the [Jan Data Folder](./data-folder)
|
||||||
2. Open `models` folder
|
2. Open `models` folder
|
||||||
3. Create a new **Folder** for your model
|
3. Create a new folder for your model
|
||||||
4. Add your `model.gguf` file
|
4. Add your `model.gguf` file
|
||||||
5. Add your `model.json` file with your configuration. Here's an example with "TinyLlama Chat 1.1B Q4":
|
5. Add a `model.yml` configuration file. Example:
|
||||||
|
|
||||||
```json
|
```yaml
|
||||||
{
|
model_path: llamacpp/models/Jan-v1-4B-Q4_K_M/model.gguf
|
||||||
"sources": [
|
name: Jan-v1-4B-Q4_K_M
|
||||||
{
|
size_bytes: 2497281632
|
||||||
"filename": "tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf",
|
|
||||||
"url": "https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"id": "tinyllama-1.1b",
|
|
||||||
"object": "model",
|
|
||||||
"name": "TinyLlama Chat 1.1B Q4",
|
|
||||||
"version": "1.0",
|
|
||||||
"description": "TinyLlama is a tiny model with only 1.1B. It's a good model for less powerful computers.",
|
|
||||||
"format": "gguf",
|
|
||||||
"settings": {
|
|
||||||
"ctx_len": 4096,
|
|
||||||
"prompt_template": "<|system|>\n{system_message}<|user|>\n{prompt}<|assistant|>",
|
|
||||||
"llama_model_path": "tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf"
|
|
||||||
},
|
|
||||||
"parameters": {
|
|
||||||
"temperature": 0.7,
|
|
||||||
"top_p": 0.95,
|
|
||||||
"stream": true,
|
|
||||||
"max_tokens": 2048,
|
|
||||||
"stop": [],
|
|
||||||
"frequency_penalty": 0,
|
|
||||||
"presence_penalty": 0
|
|
||||||
},
|
|
||||||
"metadata": {
|
|
||||||
"author": "TinyLlama",
|
|
||||||
"tags": [
|
|
||||||
"Tiny",
|
|
||||||
"Foundation Model"
|
|
||||||
],
|
|
||||||
"size": 669000000
|
|
||||||
},
|
|
||||||
"engine": "nitro"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
##### Step 2: Modify Model Parameters
|
|
||||||
|
|
||||||
Key fields to configure:
|
|
||||||
1. The **Settings** array is where you can set the path or location of your model in your computer, the context
|
|
||||||
length allowed, and the chat template expected by your model.
|
|
||||||
2. The [**Parameters**](./model-parameters) are the adjustable settings that affect how your model operates or
|
|
||||||
processes the data. The fields in the parameters array are typically general and can be used across different
|
|
||||||
models. Here is an example of model parameters:
|
|
||||||
|
|
||||||
```json
|
|
||||||
"parameters":{
|
|
||||||
"temperature": 0.7,
|
|
||||||
"top_p": 0.95,
|
|
||||||
"stream": true,
|
|
||||||
"max_tokens": 4096,
|
|
||||||
"frequency_penalty": 0,
|
|
||||||
"presence_penalty": 0,
|
|
||||||
}
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Delete Models
|
That's it! Jan now uses a simplified YAML format. All other parameters (temperature, context length, etc.) can be configured directly in the UI when you select the model.
|
||||||
|
|
||||||
|
##### Step 2: Customize in the UI
|
||||||
|
|
||||||
|
Once your model is added:
|
||||||
|
1. Select it in a chat
|
||||||
|
2. Click the gear icon next to the model
|
||||||
|
3. Adjust any parameters you need
|
||||||
|
|
||||||
|
<Aside type="note">
|
||||||
|
The simplified `model.yml` format makes model management easier. All advanced settings are now accessible through Jan's UI rather than requiring manual JSON editing.
|
||||||
|
</Aside>
|
||||||
|
|
||||||
|
### Delete Local Models
|
||||||
|
|
||||||
1. Go to **Settings > Model Providers > Llama.cpp**
|
1. Go to **Settings > Model Providers > Llama.cpp**
|
||||||
2. Find the model you want to remove
|
2. Find the model you want to remove
|
||||||
3. Select the three dots icon next to it and select **Delete Model**
|
3. Click the three dots icon and select **Delete Model**
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
## Cloud Models
|
## Cloud Models
|
||||||
|
|
||||||
|
Jan supports connecting to various AI cloud providers through OpenAI-compatible APIs, including OpenAI (GPT-4o, o1), Anthropic (Claude), Groq, Mistral, and more.
|
||||||
|
|
||||||
<Aside type="note">
|
<Aside type="note">
|
||||||
When using cloud models, be aware of any associated costs and rate limits from the providers. See detailed guide for
|
When using cloud models, be aware of associated costs and rate limits from the providers. See detailed guides for each provider in the [Cloud Providers section](./remote-models/anthropic).
|
||||||
each cloud model provider [here](./remote-models/anthropic).
|
|
||||||
</Aside>
|
</Aside>
|
||||||
|
|
||||||
Jan supports connecting to various AI cloud providers that are OpenAI API-compatible, including: OpenAI (GPT-4o, o3,...),
|
### Setting Up Cloud Models
|
||||||
Anthropic (Claude), Groq, Mistral, and more.
|
|
||||||
1. Navigate to the **Settings** page
|
1. Navigate to **Settings**
|
||||||
2. Under **Model Providers** section in the left sidebar, choose your preferred provider (OpenAI, Anthropic, etc.)
|
2. Under **Model Providers** in the left sidebar, choose your provider
|
||||||
3. Enter your API key
|
3. Enter your API key
|
||||||
4. The activated cloud models will be available in your model selector inside the **Chat** panel
|
4. Activated cloud models appear in your model selector
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
As soon as you add your key for a model provider like Anthropic or OpenAI, you will be able to pick one of their models to chat with.
|
Once you add your API key, you can select any of that provider's models in the chat interface:
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
## Choosing Between Local and Cloud
|
||||||
|
|
||||||
|
### Local Models
|
||||||
|
**Best for:**
|
||||||
|
- Privacy-sensitive work
|
||||||
|
- Offline usage
|
||||||
|
- Unlimited conversations without costs
|
||||||
|
- Full control over model behavior
|
||||||
|
|
||||||
|
**Requirements:**
|
||||||
|
- 8GB RAM minimum (16GB+ recommended)
|
||||||
|
- 10-50GB storage per model
|
||||||
|
- CPU or GPU for processing
|
||||||
|
|
||||||
|
### Cloud Models
|
||||||
|
**Best for:**
|
||||||
|
- Advanced capabilities (GPT-4, Claude 3)
|
||||||
|
- Limited hardware
|
||||||
|
- Occasional use
|
||||||
|
- Latest model versions
|
||||||
|
|
||||||
|
**Requirements:**
|
||||||
|
- Internet connection
|
||||||
|
- API keys from providers
|
||||||
|
- Usage-based payment
|
||||||
|
|
||||||
|
## Hardware Guidelines
|
||||||
|
|
||||||
|
| RAM | Recommended Model Size |
|
||||||
|
|-----|----------------------|
|
||||||
|
| 8GB | 1-3B parameters |
|
||||||
|
| 16GB | 7B parameters |
|
||||||
|
| 32GB | 13B parameters |
|
||||||
|
| 64GB+ | 30B+ parameters |
|
||||||
|
|
||||||
|
<Aside type="tip">
|
||||||
|
Start with smaller models and upgrade as needed. Jan shows compatibility warnings before downloading.
|
||||||
|
</Aside>
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- [Explore Jan Models](./jan-models/jan-v1) - Our optimized models
|
||||||
|
- [Set up Cloud Providers](./remote-models/openai) - Connect external services
|
||||||
|
- [Learn Model Parameters](./explanation/model-parameters) - Fine-tune behavior
|
||||||
|
- [Create AI Assistants](./assistants) - Customize models with instructions
|
||||||
|
|||||||
@ -40,10 +40,6 @@ complexity of managing Python environments or dependencies locally.
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
<Aside type="note">
|
|
||||||
Don't forget that MCP gets enabled once you turn on Experimental Features in Jan's General settings.
|
|
||||||
</Aside>
|
|
||||||
|
|
||||||
2. **Get API Key**: Sign up at [e2b.dev](https://e2b.dev/), generate an API key
|
2. **Get API Key**: Sign up at [e2b.dev](https://e2b.dev/), generate an API key
|
||||||
|
|
||||||

|

|
||||||
|
|||||||
@ -56,7 +56,6 @@ Linear MCP offers extensive project management capabilities:
|
|||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
|
|
||||||
- Jan with experimental features enabled
|
|
||||||
- Linear account (free for up to 250 issues)
|
- Linear account (free for up to 250 issues)
|
||||||
- Model with strong tool calling support
|
- Model with strong tool calling support
|
||||||
- Active internet connection
|
- Active internet connection
|
||||||
@ -80,10 +79,6 @@ Once logged in, you'll see your workspace:
|
|||||||
|
|
||||||
### Enable MCP in Jan
|
### Enable MCP in Jan
|
||||||
|
|
||||||
<Aside type="caution">
|
|
||||||
Enable **Experimental Features** in **Settings > General** if you don't see the MCP Servers option.
|
|
||||||
</Aside>
|
|
||||||
|
|
||||||
1. Go to **Settings > MCP Servers**
|
1. Go to **Settings > MCP Servers**
|
||||||
2. Toggle **Allow All MCP Tool Permission** ON
|
2. Toggle **Allow All MCP Tool Permission** ON
|
||||||
|
|
||||||
|
|||||||
@ -32,7 +32,6 @@ import { Aside } from '@astrojs/starlight/components';
|
|||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
|
|
||||||
- Jan with experimental features enabled
|
|
||||||
- Todoist account (free or premium)
|
- Todoist account (free or premium)
|
||||||
- Model with strong tool calling support
|
- Model with strong tool calling support
|
||||||
- Node.js installed
|
- Node.js installed
|
||||||
@ -65,10 +64,6 @@ Once logged in, you'll see your main dashboard:
|
|||||||
|
|
||||||
### Enable MCP in Jan
|
### Enable MCP in Jan
|
||||||
|
|
||||||
<Aside type="caution">
|
|
||||||
If you don't see the MCP Servers option, enable **Experimental Features** in **Settings > General** first.
|
|
||||||
</Aside>
|
|
||||||
|
|
||||||
1. Go to **Settings > MCP Servers**
|
1. Go to **Settings > MCP Servers**
|
||||||
2. Toggle **Allow All MCP Tool Permission** ON
|
2. Toggle **Allow All MCP Tool Permission** ON
|
||||||
|
|
||||||
|
|||||||
@ -9,7 +9,9 @@ sidebar:
|
|||||||
|
|
||||||
import { Aside } from '@astrojs/starlight/components';
|
import { Aside } from '@astrojs/starlight/components';
|
||||||
|
|
||||||
[Serper](https://serper.dev) provides Google search results through a simple API, making it perfect for giving AI models access to current web information. The Serper MCP integration enables Jan models to search the web and retrieve real-time information.
|
[Serper](https://serper.dev) provides Google search results through a simple API, making it
|
||||||
|
perfect for giving AI models access to current web information. The Serper MCP integration
|
||||||
|
enables Jan models to search the web and retrieve real-time information.
|
||||||
|
|
||||||
## Available Tools
|
## Available Tools
|
||||||
|
|
||||||
@ -18,7 +20,6 @@ import { Aside } from '@astrojs/starlight/components';
|
|||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
|
|
||||||
- Jan with experimental features enabled
|
|
||||||
- Serper API key from [serper.dev](https://serper.dev)
|
- Serper API key from [serper.dev](https://serper.dev)
|
||||||
- Model with tool calling support (recommended: Jan v1)
|
- Model with tool calling support (recommended: Jan v1)
|
||||||
|
|
||||||
@ -28,13 +29,6 @@ Serper offers 2,500 free searches upon signup - enough for extensive testing and
|
|||||||
|
|
||||||
## Setup
|
## Setup
|
||||||
|
|
||||||
### Enable Experimental Features
|
|
||||||
|
|
||||||
1. Go to **Settings** > **General**
|
|
||||||
2. Toggle **Experimental Features** ON
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
### Enable MCP
|
### Enable MCP
|
||||||
|
|
||||||
1. Go to **Settings** > **MCP Servers**
|
1. Go to **Settings** > **MCP Servers**
|
||||||
@ -78,12 +72,7 @@ Jan v1 is optimized for tool calling and works excellently with Serper:
|
|||||||
|
|
||||||
### Enable Tool Calling
|
### Enable Tool Calling
|
||||||
|
|
||||||
1. Go to **Settings** > **Model Providers** > **Llama.cpp**
|
Tool calling is now enabled by default on Jan.
|
||||||
2. Find Jan v1 in your models list
|
|
||||||
3. Click the edit icon
|
|
||||||
4. Toggle **Tools** ON
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
@ -102,7 +91,8 @@ What are the latest developments in quantum computing this week?
|
|||||||
|
|
||||||
**Comparative Analysis:**
|
**Comparative Analysis:**
|
||||||
```
|
```
|
||||||
What are the main differences between the Rust programming language and C++? Be spicy, hot takes are encouraged. 😌
|
What are the main differences between the Rust programming language and C++? Be spicy, hot
|
||||||
|
takes are encouraged. 😌
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
@ -140,12 +130,10 @@ What restaurants opened in San Francisco this month? Focus on Japanese cuisine.
|
|||||||
**No search results:**
|
**No search results:**
|
||||||
- Verify API key is correct
|
- Verify API key is correct
|
||||||
- Check remaining credits at serper.dev
|
- Check remaining credits at serper.dev
|
||||||
- Ensure MCP server shows as active
|
|
||||||
|
|
||||||
**Tools not appearing:**
|
**Tools not appearing:**
|
||||||
- Confirm experimental features are enabled
|
|
||||||
- Verify tool calling is enabled for your model
|
|
||||||
- Restart Jan after configuration changes
|
- Restart Jan after configuration changes
|
||||||
|
- Ensure MCP Server shows as active
|
||||||
|
|
||||||
**Poor search quality:**
|
**Poor search quality:**
|
||||||
- Use more specific search terms
|
- Use more specific search terms
|
||||||
@ -164,4 +152,6 @@ Each search query consumes one API credit. Monitor usage at serper.dev dashboard
|
|||||||
|
|
||||||
## Next Steps
|
## Next Steps
|
||||||
|
|
||||||
Serper MCP enables Jan v1 to access current web information, making it a powerful research assistant. Combine with other MCP tools for even more capabilities - use Serper for search, then E2B for data analysis, or Jupyter for visualization.
|
Serper MCP enables models to access current web information, making them powerful research
|
||||||
|
assistant. Combine with other MCP tools for even more capabilities - use Serper for search,
|
||||||
|
then E2B for data analysis, or Jupyter for visualization.
|
||||||
|
|||||||
@ -1,25 +1,39 @@
|
|||||||
---
|
---
|
||||||
title: Model Context Protocol
|
title: Model Context Protocol
|
||||||
description: Manage your interaction with AI locally.
|
description: Extend Jan's capabilities with tools and external integrations through MCP.
|
||||||
keywords:
|
keywords:
|
||||||
[
|
[
|
||||||
Jan,
|
Jan,
|
||||||
Customizable Intelligence, LLM,
|
MCP,
|
||||||
|
Model Context Protocol,
|
||||||
|
tools,
|
||||||
|
integrations,
|
||||||
|
AI tools,
|
||||||
local AI,
|
local AI,
|
||||||
privacy focus,
|
privacy focus,
|
||||||
free and open source,
|
free and open source,
|
||||||
private and offline,
|
private and offline,
|
||||||
conversational AI,
|
conversational AI,
|
||||||
no-subscription fee,
|
|
||||||
large language models,
|
large language models,
|
||||||
threads,
|
external APIs,
|
||||||
chat history,
|
|
||||||
thread history,
|
|
||||||
]
|
]
|
||||||
---
|
---
|
||||||
|
|
||||||
import { Aside } from '@astrojs/starlight/components';
|
import { Aside } from '@astrojs/starlight/components';
|
||||||
|
|
||||||
|
## Tools in Jan
|
||||||
|
|
||||||
|
Jan supports powerful tool integrations that extend your AI's capabilities beyond simple text generation. These tools are implemented through the **Model Context Protocol (MCP)**, allowing your AI to search the web, execute code, manage files, and interact with external services.
|
||||||
|
|
||||||
|
**Available tool categories:**
|
||||||
|
- **Web & Search** - Real-time web search, browser automation
|
||||||
|
- **Code & Analysis** - Jupyter notebooks, code execution, data analysis
|
||||||
|
- **Productivity** - Task management, calendar integration, note-taking
|
||||||
|
- **Creative** - Design tools, content generation, media manipulation
|
||||||
|
- **File Management** - Document processing, file operations, data extraction
|
||||||
|
|
||||||
|
Tools work with both local and cloud models, though compatibility varies. Cloud models like GPT-4 and Claude typically offer the best tool-calling performance, while newer local models are rapidly improving their tool capabilities.
|
||||||
|
|
||||||
```mermaid
|
```mermaid
|
||||||
graph TD
|
graph TD
|
||||||
subgraph "What is MCP?"
|
subgraph "What is MCP?"
|
||||||
@ -62,139 +76,223 @@ graph TD
|
|||||||
style Templates fill:transparent
|
style Templates fill:transparent
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## What is MCP?
|
||||||
|
|
||||||
Jan now supports the **Model Context Protocol (MCP)**, an open standard designed to allow language models to
|
Jan supports the **Model Context Protocol (MCP)**, an open standard that allows AI models to interact with external tools and data sources in a secure, standardized way.
|
||||||
interact with external tools and data sources.
|
|
||||||
|
|
||||||
MCPs act as a common interface, standardizing the way an AI model can interact with external tools and data
|
MCP solves the integration challenge by creating a common interface between AI models and external tools. Instead of building custom connectors for every model-tool combination, MCP provides a universal protocol that any compatible model can use with any compatible tool.
|
||||||
sources. This enables a model to connect to any MCP-compliant tool without requiring custom
|
|
||||||
integration work. The way this works is via clients and servers. Clients are connected to an AI model and a host
|
|
||||||
where a user will describe the task needed to be done. These applications hosting client will want to connect
|
|
||||||
to different data sources to accomplish a task, for example, notion, google sheets, or even custom APIs. These
|
|
||||||
applications will be connected to a server with prompts, tools, and data sources which will be used to complete
|
|
||||||
the task.
|
|
||||||
|
|
||||||
Jan is an MCP host that allows you to download different clients and servers and use them to accomplish a task.
|
**How it works:**
|
||||||
|
- **MCP Servers** provide tools, data sources, and capabilities
|
||||||
|
- **MCP Clients** (like Jan) connect models to these servers
|
||||||
|
- **Standardized Protocol** ensures compatibility across different implementations
|
||||||
|
|
||||||
This document outlines the benefits, risks, and implementation of MCPs within Jan.
|
This architecture means you can easily add new capabilities to your AI without complex integrations, and tools built for one AI system work with others that support MCP.
|
||||||
|
|
||||||
## Core Benefits of MCP
|
## Core Benefits
|
||||||
|
|
||||||
Integrating MCP provides a structured way to extend the capabilities of the models you use in Jan. Here are the three
|
**Standardization:** MCP eliminates the "M x N" integration problem where every AI model needs unique connectors for every tool. One standard interface works everywhere.
|
||||||
|
|
||||||
* **Standardization:** MCP aims to solve the "M x N" integration problem, where every model (M) needs a
|
**Extensibility:** Add powerful new capabilities to your AI models. Search local codebases, query databases, interact with web APIs, automate browser tasks, and more.
|
||||||
unique connector for every tool (N). By adapting to a single standard, any compliant model can interface with any compliant tool.
|
|
||||||
* **Extensibility:** This allows you to augment your models with new abilities. For instance, an AI can be granted
|
**Flexibility:** Swap models and tools easily. Your MCP setup works whether you're using local models, Claude, GPT-4, or future AI systems.
|
||||||
access to search your local codebase, query a database, or interact with web APIs, all through the same protocol.
|
|
||||||
* **Flexibility:** Because the interface is standardized, you can swap out models or tools with minimal friction,
|
**Security:** User-controlled permissions ensure you decide which tools can access what resources. Tools run in isolated environments with explicit consent.
|
||||||
making your workflows more modular and adaptable over time.
|
|
||||||
|
|
||||||
<Aside type="caution">
|
<Aside type="caution">
|
||||||
Please note that not all models that you can download and use, whether in Jan or other tools, may be good at
|
Not all models excel at tool calling. For best results:
|
||||||
tool calling or compatible with MCP. Make sure that the model you choose is MCP-compliant before integrating
|
- **Cloud models:** GPT-4, Claude 3.5+, and Gemini Pro offer excellent tool capabilities
|
||||||
it into your workflows. This might be available in the model card or you may need to implement it yourself to
|
- **Local models:** Newer models like Qwen3, Gemma3, and function-calling variants work well
|
||||||
test the capabilities of the model.
|
- **Check compatibility:** Review model cards for tool-calling performance before setup
|
||||||
</Aside>
|
</Aside>
|
||||||
|
|
||||||
|
## Model Compatibility Requirements
|
||||||
|
|
||||||
<Aside type="note">
|
<Aside type="note">
|
||||||
To use MCP effectively, ensure your AI model supports tool calling capabilities:
|
To use MCP tools effectively, ensure your model supports tool calling:
|
||||||
- For cloud models (like Claude or GPT-4): Verify tool calling is enabled in your API settings
|
|
||||||
- For local models: Enable tool calling in the model parameters [click the edit button in Model Capabilities](/docs/model-parameters#model-capabilities-edit-button)
|
**For cloud models:**
|
||||||
- Check the model's documentation to confirm MCP compatibility
|
- Enable tool calling in provider settings (usually automatic)
|
||||||
|
- Verify API supports function calling endpoints
|
||||||
|
- Check model-specific documentation for tool capabilities
|
||||||
|
|
||||||
|
**For local models:**
|
||||||
|
- Enable tool calling in model settings (gear icon → capabilities)
|
||||||
|
- Choose models specifically trained for function calling
|
||||||
|
- Monitor performance as tool complexity increases
|
||||||
</Aside>
|
</Aside>
|
||||||
|
|
||||||
## Considerations and Risks
|
## Security and Considerations
|
||||||
|
|
||||||
While powerful, MCP is an evolving standard, and its use requires careful consideration of the following points:
|
MCP provides powerful capabilities that require careful security consideration:
|
||||||
|
|
||||||
* **Security:** Granting a model access to external tools is a significant security consideration. A compromised
|
**Security Model:**
|
||||||
tool or a malicious prompt could potentially lead to unintended actions or data exposure. Jan's implementation
|
- **Explicit permissions** for each tool and capability
|
||||||
focuses on user-managed permissions to mitigate this risk, meaning, you have to turn on the permission for each
|
- **Isolated execution** prevents cross-tool interference
|
||||||
tool individually.
|
- **User approval** required for sensitive operations
|
||||||
* **Standard Maturity:** As a relatively new protocol, best practices or sensible defaults are still being
|
- **Audit trails** track all tool usage and outputs
|
||||||
established. Users should be aware of potential issues like prompt injection, where an input could be crafted to
|
|
||||||
misuse a tool's capabilities.
|
|
||||||
* **Resource Management:** Active MCP connections may consume a portion of a model's context window, which could
|
|
||||||
affect performance (i.e., the more tools the model and the larger the context of the conversation has the longer
|
|
||||||
you will need to wait for a response). Efficient management of tools and their outputs is important.
|
|
||||||
|
|
||||||
|
**Performance Impact:**
|
||||||
|
- **Context usage:** Active tools consume model context window space
|
||||||
|
- **Response time:** More tools may slow generation slightly
|
||||||
|
- **Resource usage:** Some tools require additional system resources
|
||||||
|
|
||||||
## Configure and Use MCPs within Jan
|
**Best Practices:**
|
||||||
|
- Enable only tools you actively need
|
||||||
|
- Review tool permissions regularly
|
||||||
|
- Monitor system resource usage
|
||||||
|
- Keep MCP servers updated for security patches
|
||||||
|
|
||||||
To illustrate how MCPs can be used within Jan, we will walk through an example using the [Browser MCP](https://browsermcp.io/).
|
## Setting Up MCP in Jan
|
||||||
|
|
||||||
Before we begin, you will need to enable experimental features at `General` > `Advanced`. Next, go to `Settings` > `MCP Servers`, and toggle
|
### Prerequisites
|
||||||
the `Allow All MCP Tool Permission` switch ON.
|
|
||||||
|
|
||||||

|
Ensure you have the required runtime environments:
|
||||||
|
- **Node.js** - Download from [nodejs.org](https://nodejs.org/)
|
||||||
|
- **Python** - Download from [python.org](https://www.python.org/)
|
||||||
|
|
||||||
Please note that you will also need to have **NodeJS** and/or **Python** installed on your machine. In case you don't
|
Most MCP tools require one or both of these environments.
|
||||||
have either, you can download them from the official websites at the links below:
|
|
||||||
- [Node.js](https://nodejs.org/)
|
|
||||||
- [Python](https://www.python.org/)
|
|
||||||
|
|
||||||
|
### Enable MCP Support
|
||||||
|
|
||||||
### Browser MCP
|
Navigate to **Settings → MCP Servers** and toggle **Allow All MCP Tool Permission** to ON.
|
||||||
|
|
||||||
- Click on the `+` sign on the upper right-hand corner of the MCP box.
|

|
||||||
|
|
||||||

|
This global setting allows Jan to connect to MCP servers. You'll still control individual tool permissions.
|
||||||
|
|
||||||
- Enter the following details to configure the BrowserMCP:
|
### Example: Browser MCP Setup
|
||||||
- **Server Name**: `browsermcp`
|
|
||||||
- **Command**: `npx`
|
|
||||||
- **Arguments**: `@browsermcp/mcp`
|
|
||||||
- **Environment Variables**: You can leave this field empty.
|
|
||||||
|
|
||||||

|
Let's configure Browser MCP for web automation as a practical example:
|
||||||
|
|
||||||
- Check that the server has been activated successfully.
|
#### Step 1: Add MCP Server
|
||||||
|
|
||||||

|
Click the `+` button in the MCP Servers section:
|
||||||
|
|
||||||
- Open your favorite chrome-based browser (e.g., Google Chrome, Brave, Vivaldi, Microsoft Edge, etc...) and navigate to the
|

|
||||||
[Browser MCP Extension Page](https://chromewebstore.google.com/detail/browser-mcp-automate-your/bjfgambnhccakkhmkepdoekmckoijdlc).
|
|
||||||
|
|
||||||

|
#### Step 2: Configure Browser MCP
|
||||||
|
|
||||||
- Make sure to enable the extension to run on private windows. Since Browser Use will have access to all sites you've
|
Enter these details:
|
||||||
already logged into in your regular browser session, it is best to give it a clean slate to start from.
|
- **Server Name:** `browsermcp`
|
||||||
|
- **Command:** `npx`
|
||||||
|
- **Arguments:** `@browsermcp/mcp`
|
||||||
|
- **Environment Variables:** Leave empty
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
- Enable the extension to run on private windows by clicking on it and Connecting to the Browser MCP server.
|
#### Step 3: Verify Connection
|
||||||
|
|
||||||

|
Confirm the server shows as active:
|
||||||
|
|
||||||
- Go back to Jan and pick a model with good tool use capabilities, for example, Claude 3.7 and 4 Sonnet, or Claude 4 Opus,
|

|
||||||
and make sure to enable tool calling via the UI by going to **Model Providers > Anthropic** and, after you have entered your
|
|
||||||
API key, enable tool from the **+** button.
|
|
||||||
|
|
||||||

|
#### Step 4: Install Browser Extension
|
||||||
|
|
||||||
You can check and see if this was accurate below.
|
Install the [Browser MCP Chrome Extension](https://chromewebstore.google.com/detail/browser-mcp-automate-your/bjfgambnhccakkhmkepdoekmckoijdlc) to enable browser control:
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
#### Step 5: Configure Extension
|
||||||
|
|
||||||
|
Enable the extension for private browsing (recommended for clean sessions):
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Connect the extension to your MCP server:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
#### Step 6: Enable Model Tools
|
||||||
|
|
||||||
|
Select a model with strong tool-calling capabilities and enable tools:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Verify tool calling is active:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## Available MCP Integrations
|
||||||
|
|
||||||
|
Jan supports a growing ecosystem of MCP tools:
|
||||||
|
|
||||||
|
### Web & Search
|
||||||
|
- **Browser Control** - Automate web browsing tasks
|
||||||
|
- **Web Search** - Real-time search with Serper, Exa
|
||||||
|
- **Screenshot** - Capture and analyze web content
|
||||||
|
|
||||||
|
### Development
|
||||||
|
- **Code Execution** - Run code in secure sandboxes
|
||||||
|
- **GitHub** - Repository management and analysis
|
||||||
|
- **Documentation** - Generate and maintain docs
|
||||||
|
|
||||||
|
### Productivity
|
||||||
|
- **Task Management** - Todoist, Linear integration
|
||||||
|
- **Calendar** - Schedule and meeting management
|
||||||
|
- **Note Taking** - Obsidian, Notion connectivity
|
||||||
|
|
||||||
|
### Creative
|
||||||
|
- **Design Tools** - Canva integration for graphics
|
||||||
|
- **Content Generation** - Blog posts, social media
|
||||||
|
- **Media Processing** - Image and video manipulation
|
||||||
|
|
||||||
|
Explore specific integrations in our [MCP Examples](./mcp-examples/browser/browserbase) section.
|
||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
- The MCP server won't connect even though I've already added it to my list of MCP Servers?
|
### Connection Issues
|
||||||
- Make sure you have NodeJS and Python installed
|
|
||||||
- Make sure you typed the commands correctly in the MCP Server form
|
|
||||||
- Make sure the model you are using has tools enabled
|
|
||||||
- Restart Jan
|
|
||||||
- The open source model I picked won't use the MCPs I enabled.
|
|
||||||
- Make sure the model you are using has tools enabled
|
|
||||||
- Lots of open source models are not designed to use tools or simply don't work well with them, so you may need to try a different model
|
|
||||||
- The model you have selected might be good at tool calling but it is possible that it does not support images, effectively making it unsuitable for some tools that take screenshots of a website like the Browser MCP
|
|
||||||
|
|
||||||
## Future Potential
|
**MCP server won't connect:**
|
||||||
|
- Verify Node.js and Python are installed correctly
|
||||||
|
- Check command syntax in server configuration
|
||||||
|
- Restart Jan after adding new servers
|
||||||
|
- Review server logs for specific error messages
|
||||||
|
|
||||||
This integration is the foundation for creating more capable and context-aware AI assistants within Jan. The
|
**Tools not appearing:**
|
||||||
long-term goal is to enable more sophisticated workflows that make use of your local environment securely as
|
- Ensure model has tool calling enabled
|
||||||
well as your favorite tools.
|
- Verify MCP permissions are active
|
||||||
|
- Check that the server status shows as running
|
||||||
|
- Try with a different model known for good tool support
|
||||||
|
|
||||||
For example, an AI could cross-reference information between a local document and a remote API, or use a
|
### Performance Problems
|
||||||
local script toanalyze data and then summarize the findings, all orchestrated through Jan's interface. As
|
|
||||||
the MCP ecosystem grows, so will the potential applications within Jan.
|
**Slow responses with tools:**
|
||||||
|
- Reduce number of active tools
|
||||||
|
- Use models with larger context windows
|
||||||
|
- Monitor system resource usage
|
||||||
|
- Consider using faster local models or cloud providers
|
||||||
|
|
||||||
|
**Model not using tools effectively:**
|
||||||
|
- Switch to models specifically trained for tool calling
|
||||||
|
- Provide more explicit instructions about tool usage
|
||||||
|
- Check model documentation for tool-calling examples
|
||||||
|
- Test with proven tool-compatible models first
|
||||||
|
|
||||||
|
### Model Compatibility
|
||||||
|
|
||||||
|
**Local models not calling tools:**
|
||||||
|
- Ensure the model supports function calling in its training
|
||||||
|
- Enable tool calling in model capabilities settings
|
||||||
|
- Try newer model versions with improved tool support
|
||||||
|
- Consider switching to cloud models for complex tool workflows
|
||||||
|
|
||||||
|
## Future Development
|
||||||
|
|
||||||
|
MCP integration in Jan continues evolving with new capabilities:
|
||||||
|
|
||||||
|
**Planned Features:**
|
||||||
|
- **Visual tool builder** for custom MCP servers
|
||||||
|
- **Tool marketplace** for easy discovery and installation
|
||||||
|
- **Enhanced security** with granular permission controls
|
||||||
|
- **Performance optimization** for faster tool execution
|
||||||
|
|
||||||
|
**Ecosystem Growth:**
|
||||||
|
- More professional tools (CRM, analytics, design)
|
||||||
|
- Better local model tool-calling performance
|
||||||
|
- Cross-platform mobile tool support
|
||||||
|
- Enterprise-grade security and compliance features
|
||||||
|
|
||||||
|
The MCP ecosystem enables increasingly sophisticated AI workflows. As more tools become available and models improve their tool-calling abilities, Jan becomes a more powerful platform for augmented productivity and creativity.
|
||||||
|
|
||||||
|
Start with simple tools like web search or code execution, then gradually expand your toolkit as you discover new use cases and workflows that benefit from AI-tool collaboration.
|
||||||
175
website/src/content/docs/jan/multi-modal.mdx
Normal file
175
website/src/content/docs/jan/multi-modal.mdx
Normal file
@ -0,0 +1,175 @@
|
|||||||
|
---
|
||||||
|
title: Multi-Modal Support
|
||||||
|
description: Use images with AI models in Jan - local vision models and cloud providers with image understanding.
|
||||||
|
keywords:
|
||||||
|
[
|
||||||
|
Jan,
|
||||||
|
multi-modal,
|
||||||
|
vision models,
|
||||||
|
image recognition,
|
||||||
|
Gemma3,
|
||||||
|
Qwen3,
|
||||||
|
Claude,
|
||||||
|
GPT-4V,
|
||||||
|
image attachment,
|
||||||
|
visual AI,
|
||||||
|
]
|
||||||
|
sidebar:
|
||||||
|
badge:
|
||||||
|
text: New
|
||||||
|
variant: tip
|
||||||
|
---
|
||||||
|
|
||||||
|
import { Aside } from '@astrojs/starlight/components';
|
||||||
|
|
||||||
|
Jan supports image attachments with both local and cloud AI models. Upload images directly in your chats and get visual understanding, analysis, and creative responses from compatible models.
|
||||||
|
|
||||||
|
## Local Vision Models
|
||||||
|
|
||||||
|
Local models with image support work immediately without configuration. Popular vision models include the latest Gemma3 and Qwen3 series, which excel at image understanding while running entirely on your device.
|
||||||
|
|
||||||
|
**Recommended Local Vision Models:**
|
||||||
|
- **Gemma3 4B** - Excellent balance of performance and resource usage
|
||||||
|
- **Qwen3 7B/14B** - Superior image analysis capabilities
|
||||||
|
- **LLaVA models** - Specialized for visual question answering
|
||||||
|
|
||||||
|
### Example: Image Analysis
|
||||||
|
|
||||||
|
Here's Gemma3 4B analyzing a meme with some personality:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Load a vision model like [Gemma3 4B](https://huggingface.co/unsloth/gemma-3-4b-it-GGUF) and attach your image:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
**Prompt used:** "Describe what you see in the image please. Be a bit sarcastic."
|
||||||
|
|
||||||
|
The model delivers contextual analysis with the requested tone:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
<Aside type="note">
|
||||||
|
Local vision models process images completely offline. Your images never leave your device.
|
||||||
|
</Aside>
|
||||||
|
|
||||||
|
## Cloud Vision Models
|
||||||
|
|
||||||
|
Cloud providers like OpenAI (GPT-4V), Anthropic (Claude), and Google (Gemini) offer powerful vision capabilities. However, image support must be manually enabled for each model.
|
||||||
|
|
||||||
|
### Enabling Vision for Cloud Models
|
||||||
|
|
||||||
|
Navigate to your model settings and enable vision support:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Toggle both **Tools** and **Vision** if you want to combine image understanding with web search or other MCP capabilities.
|
||||||
|
|
||||||
|
### Example: Creative Image Analysis
|
||||||
|
|
||||||
|
With Claude 3.5 Sonnet configured for vision, upload an image and get creative responses:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
**Prompt used:** "Write an AI joke about the image attached please."
|
||||||
|
|
||||||
|
Claude combines image understanding with humor:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## Supported Use Cases
|
||||||
|
|
||||||
|
### Creative and Fun
|
||||||
|
- Meme analysis and creation
|
||||||
|
- Visual jokes and commentary
|
||||||
|
- Art critique and style analysis
|
||||||
|
- Creative writing from visual prompts
|
||||||
|
|
||||||
|
### Practical Applications
|
||||||
|
- Document analysis and OCR
|
||||||
|
- Chart and graph interpretation
|
||||||
|
- Product identification and comparison
|
||||||
|
- Technical diagram explanation
|
||||||
|
|
||||||
|
### Educational and Research
|
||||||
|
- Historical photo analysis
|
||||||
|
- Scientific image interpretation
|
||||||
|
- Visual learning assistance
|
||||||
|
- Research documentation
|
||||||
|
|
||||||
|
## Model Capabilities Comparison
|
||||||
|
|
||||||
|
| Model Type | Image Support | Setup Required | Privacy | Best For |
|
||||||
|
|------------|---------------|----------------|---------|----------|
|
||||||
|
| **Local (Gemma3, Qwen3)** | Automatic | None | Complete | Privacy, offline use |
|
||||||
|
| **GPT-4V** | Manual enable | API key + toggle | Cloud processed | Advanced analysis |
|
||||||
|
| **Claude 3.5 Sonnet** | Manual enable | API key + toggle | Cloud processed | Creative tasks |
|
||||||
|
| **Gemini Pro Vision** | Manual enable | API key + toggle | Cloud processed | Multi-language |
|
||||||
|
|
||||||
|
## Image Format Support
|
||||||
|
|
||||||
|
Jan accepts common image formats:
|
||||||
|
- **JPEG/JPG** - Most compatible
|
||||||
|
- **PNG** - Full transparency support
|
||||||
|
- **WebP** - Modern web format
|
||||||
|
- **GIF** - Static images only
|
||||||
|
|
||||||
|
<Aside type="tip">
|
||||||
|
For best results, use clear, well-lit images under 10MB. Higher resolution images provide more detail for analysis.
|
||||||
|
</Aside>
|
||||||
|
|
||||||
|
## Example Prompts
|
||||||
|
|
||||||
|
### Technical Analysis
|
||||||
|
```
|
||||||
|
Analyze this circuit diagram and explain how it works. Identify any potential issues or improvements.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Creative Tasks
|
||||||
|
```
|
||||||
|
Look at this artwork and write a short story inspired by the mood and colors you see.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Educational Support
|
||||||
|
```
|
||||||
|
Help me understand this math problem shown in the image. Walk through the solution step by step.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Business Applications
|
||||||
|
```
|
||||||
|
Review this presentation slide and suggest improvements for clarity and visual impact.
|
||||||
|
```
|
||||||
|
|
||||||
|
### OCR and Document Processing
|
||||||
|
```
|
||||||
|
Extract all the text from this document and format it as a clean markdown list.
|
||||||
|
```
|
||||||
|
|
||||||
|
## Future Improvements
|
||||||
|
|
||||||
|
We're actively improving multi-modal support:
|
||||||
|
|
||||||
|
**Automatic Detection:** Models will show visual capabilities without manual configuration
|
||||||
|
**Batch Processing:** Upload multiple images for comparison and analysis
|
||||||
|
**Better Indicators:** Clear visual cues for vision-enabled models
|
||||||
|
**Enhanced Formats:** Support for more image types and sizes
|
||||||
|
|
||||||
|
## Performance Tips
|
||||||
|
|
||||||
|
**Local Models:**
|
||||||
|
- Ensure sufficient RAM (8GB+ recommended for vision models)
|
||||||
|
- Use GPU acceleration for faster image processing
|
||||||
|
- Start with smaller models if resources are limited
|
||||||
|
|
||||||
|
**Cloud Models:**
|
||||||
|
- Monitor API usage as vision requests typically cost more
|
||||||
|
- Resize large images before upload to save bandwidth
|
||||||
|
- Combine with tools for enhanced workflows
|
||||||
|
|
||||||
|
## Privacy Considerations
|
||||||
|
|
||||||
|
**Local Processing:** Images processed by local models never leave your device. Complete privacy for sensitive visual content.
|
||||||
|
|
||||||
|
**Cloud Processing:** Images sent to cloud providers are processed on their servers. Check provider privacy policies for data handling practices.
|
||||||
|
|
||||||
|
Multi-modal AI opens new possibilities for visual understanding and creative assistance. Whether you prefer local privacy or cloud capabilities, Jan makes it easy to work with images and text together.
|
||||||
@ -13,14 +13,15 @@ keywords:
|
|||||||
installation,
|
installation,
|
||||||
conversations,
|
conversations,
|
||||||
]
|
]
|
||||||
|
banner:
|
||||||
|
content: |
|
||||||
|
👋Jan now <a href="./multi-modal">supports image 🖼️ attachments</a> 🎉
|
||||||
---
|
---
|
||||||
|
|
||||||
import { Aside } from '@astrojs/starlight/components';
|
import { Aside } from '@astrojs/starlight/components';
|
||||||
|
|
||||||
Get up and running with Jan in minutes. This guide will help you install Jan, download a model, and start chatting immediately.
|
Get up and running with Jan in minutes. This guide will help you install Jan, download a model, and start chatting immediately.
|
||||||
|
|
||||||
<ol>
|
|
||||||
|
|
||||||
### Step 1: Install Jan
|
### Step 1: Install Jan
|
||||||
|
|
||||||
1. [Download Jan](/download)
|
1. [Download Jan](/download)
|
||||||
@ -61,8 +62,6 @@ Try asking Jan v1 questions like:
|
|||||||
**Want to give Jan v1 access to current web information?** Check out our [Serper MCP tutorial](/docs/mcp-examples/search/serper) to enable real-time web search with 2,500 free searches!
|
**Want to give Jan v1 access to current web information?** Check out our [Serper MCP tutorial](/docs/mcp-examples/search/serper) to enable real-time web search with 2,500 free searches!
|
||||||
</Aside>
|
</Aside>
|
||||||
|
|
||||||
</ol>
|
|
||||||
|
|
||||||
## Managing Conversations
|
## Managing Conversations
|
||||||
|
|
||||||
Jan organizes conversations into threads for easy tracking and revisiting.
|
Jan organizes conversations into threads for easy tracking and revisiting.
|
||||||
|
|||||||
136
website/src/content/docs/jan/remote-models/huggingface.mdx
Normal file
136
website/src/content/docs/jan/remote-models/huggingface.mdx
Normal file
@ -0,0 +1,136 @@
|
|||||||
|
---
|
||||||
|
title: Hugging Face
|
||||||
|
description: Learn how to integrate Hugging Face models with Jan using the Router or Inference Endpoints.
|
||||||
|
keywords:
|
||||||
|
[
|
||||||
|
Hugging Face,
|
||||||
|
Jan,
|
||||||
|
Jan AI,
|
||||||
|
Hugging Face Router,
|
||||||
|
Hugging Face Inference Endpoints,
|
||||||
|
Hugging Face API,
|
||||||
|
Hugging Face Integration,
|
||||||
|
Hugging Face API Integration
|
||||||
|
]
|
||||||
|
---
|
||||||
|
|
||||||
|
import { Aside } from '@astrojs/starlight/components';
|
||||||
|
|
||||||
|
|
||||||
|
Jan supports Hugging Face models through two methods: the new **HF Router** (recommended) and **Inference Endpoints**. Both methods require a Hugging Face token and **billing to be set up**.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## Option 1: HF Router (Recommended)
|
||||||
|
|
||||||
|
The HF Router provides access to models from multiple providers (Replicate, Together AI, SambaNova, Fireworks, Cohere, and more) through a single endpoint.
|
||||||
|
|
||||||
|
### Step 1: Get Your HF Token
|
||||||
|
|
||||||
|
Visit [Hugging Face Settings > Access Tokens](https://huggingface.co/settings/tokens) and create a token. Make sure you have billing set up on your account.
|
||||||
|
|
||||||
|
### Step 2: Configure Jan
|
||||||
|
|
||||||
|
1. Go to **Settings** > **Model Providers** > **HuggingFace**
|
||||||
|
2. Enter your HF token
|
||||||
|
3. Use this URL: `https://router.huggingface.co/v1`
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
You can find out more about the HF Router [here](https://huggingface.co/docs/inference-providers/index).
|
||||||
|
|
||||||
|
### Step 3: Start Using Models
|
||||||
|
|
||||||
|
Jan comes with three HF Router models pre-configured. Select one and start chatting immediately.
|
||||||
|
|
||||||
|
<Aside type='note'>
|
||||||
|
The HF Router automatically routes your requests to the best available provider for each model, giving you access to a wide variety of models without managing individual endpoints.
|
||||||
|
</Aside>
|
||||||
|
|
||||||
|
## Option 2: HF Inference Endpoints
|
||||||
|
|
||||||
|
For more control over specific models and deployment configurations, you can use Hugging Face Inference Endpoints.
|
||||||
|
|
||||||
|
### Step 1: Navigate to the HuggingFace Model Hub
|
||||||
|
|
||||||
|
Visit the [Hugging Face Model Hub](https://huggingface.co/models) (make sure you are logged in) and pick the model you want to use.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
### Step 2: Configure HF Inference Endpoint and Deploy
|
||||||
|
|
||||||
|
After you have selected the model you want to use, click on the **Deploy** button and select a deployment method. We will select HF Inference Endpoints for this one.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
This will take you to the deployment set up page. For this example, we will leave the default settings as they are under the GPU tab and click on **Create Endpoint**.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Once your endpoint is ready, test that it works on the **Test your endpoint** tab.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
If you get a response, you can click on **Copy** to copy the endpoint URL and API key.
|
||||||
|
|
||||||
|
<Aside type='note'>
|
||||||
|
You will need to be logged into the HuggingFace Inference Endpoints and have a credit card on file to deploy a model.
|
||||||
|
</Aside>
|
||||||
|
|
||||||
|
### Step 3: Configure Jan
|
||||||
|
|
||||||
|
If you do not have an API key you can create one under **Settings** > **Access Tokens** [here](https://huggingface.co/settings/tokens). Once you finish, copy the token and add it to Jan alongside your endpoint URL at **Settings** > **Model Providers** > **HuggingFace**.
|
||||||
|
|
||||||
|
**3.1 HF Token**
|
||||||
|

|
||||||
|
|
||||||
|
**3.2 HF Endpoint URL**
|
||||||
|

|
||||||
|
|
||||||
|
**3.3 Jan Settings**
|
||||||
|

|
||||||
|
|
||||||
|
<Aside type='caution'>
|
||||||
|
Make sure to add `/v1/` to the end of your endpoint URL. This is required by the OpenAI API.
|
||||||
|
</Aside>
|
||||||
|
|
||||||
|
**3.4 Add Model Details**
|
||||||
|

|
||||||
|
|
||||||
|
### Step 4: Start Using the Model
|
||||||
|
|
||||||
|
Now you can start using the model in any chat.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
If you want to learn how to use Jan Nano with MCP, check out [the guide here](../jan-models/jan-nano-32).
|
||||||
|
|
||||||
|
## Available Hugging Face Models
|
||||||
|
|
||||||
|
**Option 1 (HF Router):** Access to models from multiple providers as shown in the providers image above.
|
||||||
|
|
||||||
|
**Option 2 (Inference Endpoints):** You can follow the steps above with a large amount of models on Hugging Face and bring them to Jan. Check out other models in the [Hugging Face Model Hub](https://huggingface.co/models).
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
Common issues and solutions:
|
||||||
|
|
||||||
|
**1. Started a chat but the model is not responding**
|
||||||
|
- Verify your API_KEY/HF_TOKEN is correct and not expired
|
||||||
|
- Ensure you have billing set up on your HF account
|
||||||
|
- For Inference Endpoints: Ensure the model you're trying to use is running again since, after a while, they go idle so that you don't get charged when you are not using it
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
**2. Connection Problems**
|
||||||
|
- Check your internet connection
|
||||||
|
- Verify Hugging Face's system status
|
||||||
|
- Look for error messages in [Jan's logs](/docs/troubleshooting#how-to-get-error-logs)
|
||||||
|
|
||||||
|
**3. Model Unavailable**
|
||||||
|
- Confirm your API key has access to the model
|
||||||
|
- Check if you're using the correct model ID
|
||||||
|
- Verify your Hugging Face account has the necessary permissions
|
||||||
|
|
||||||
|
Need more help? Join our [Discord community](https://discord.gg/FTk2MvZwJH) or check the
|
||||||
|
[Hugging Face's documentation](https://docs.huggingface.co/en/inference-endpoints/index).
|
||||||
@ -1,179 +0,0 @@
|
|||||||
---
|
|
||||||
title: Jan Data Folder
|
|
||||||
description: Understand where Jan stores your data and how to monitor server logs.
|
|
||||||
keywords:
|
|
||||||
[
|
|
||||||
Jan,
|
|
||||||
local AI,
|
|
||||||
data folder,
|
|
||||||
logs,
|
|
||||||
server logs,
|
|
||||||
troubleshooting,
|
|
||||||
privacy,
|
|
||||||
local storage,
|
|
||||||
file structure,
|
|
||||||
]
|
|
||||||
---
|
|
||||||
|
|
||||||
import { Aside, Tabs, TabItem } from '@astrojs/starlight/components';
|
|
||||||
|
|
||||||
Jan stores all your data locally on your computer. No cloud storage, no external servers -
|
|
||||||
everything stays on your machine.
|
|
||||||
|
|
||||||
## Quick Access
|
|
||||||
|
|
||||||
**Via Jan Interface:**
|
|
||||||
1. Go to Settings (⚙️) > Advanced Settings
|
|
||||||
2. Click the folder icon 📁
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
**Via File Explorer:**
|
|
||||||
|
|
||||||
<Tabs>
|
|
||||||
<TabItem label="Windows">
|
|
||||||
```cmd
|
|
||||||
%APPDATA%\Jan\data
|
|
||||||
```
|
|
||||||
</TabItem>
|
|
||||||
<TabItem label="macOS">
|
|
||||||
```bash
|
|
||||||
~/Library/Application Support/Jan/data
|
|
||||||
```
|
|
||||||
</TabItem>
|
|
||||||
<TabItem label="Linux">
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Default installation
|
|
||||||
~/.config/Jan/data
|
|
||||||
|
|
||||||
# Custom installation
|
|
||||||
$XDG_CONFIG_HOME/Jan/data
|
|
||||||
```
|
|
||||||
|
|
||||||
</TabItem>
|
|
||||||
|
|
||||||
</Tabs>
|
|
||||||
|
|
||||||
## Monitoring Server Logs
|
|
||||||
|
|
||||||
When Jan's local server is running, you can monitor real-time activity in the logs folder:
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
### Live Log Monitoring
|
|
||||||
|
|
||||||
**Real-time logs show:**
|
|
||||||
- API requests and responses
|
|
||||||
- Model loading and inference activity
|
|
||||||
- Error messages and warnings
|
|
||||||
- Performance metrics
|
|
||||||
- Connection attempts from external applications
|
|
||||||
|
|
||||||
**Accessing logs:**
|
|
||||||
- **In Jan**: System Monitor (footer) > App Log
|
|
||||||
- **File location**: `/logs/app.log`
|
|
||||||
|
|
||||||
### Log Categories
|
|
||||||
|
|
||||||
| Log Type | What It Shows | When It's Useful |
|
|
||||||
|----------|---------------|------------------|
|
|
||||||
| **[APP]** | Core application events | Startup issues, crashes, general errors |
|
|
||||||
| **[SERVER]** | API server activity | Connection problems, request failures |
|
|
||||||
| **[SPECS]** | Hardware information | Performance issues, compatibility problems |
|
|
||||||
| **[MODEL]** | Model operations | Loading failures, inference errors |
|
|
||||||
|
|
||||||
## Data Structure
|
|
||||||
|
|
||||||
```
|
|
||||||
jan/
|
|
||||||
├── assistants/ # AI personality settings
|
|
||||||
│ └── jan/
|
|
||||||
│ └── assistant.json
|
|
||||||
├── engines/ # Engine configurations
|
|
||||||
│ └── llama.cpp/
|
|
||||||
├── extensions/ # Add-on modules
|
|
||||||
│ └── extensions.json
|
|
||||||
├── logs/ # Server and application logs
|
|
||||||
│ └── app.log # Main log file
|
|
||||||
├── models/ # Downloaded AI models
|
|
||||||
│ └── huggingface.co/
|
|
||||||
└── threads/ # Chat conversations
|
|
||||||
└── thread_id/
|
|
||||||
├── messages.jsonl
|
|
||||||
└── thread.json
|
|
||||||
```
|
|
||||||
|
|
||||||
## Key Folders Explained
|
|
||||||
|
|
||||||
### `/logs/` - Server Activity Hub
|
|
||||||
Contains all application and server logs. Essential for troubleshooting and monitoring API activity.
|
|
||||||
|
|
||||||
**What you'll find:**
|
|
||||||
- Real-time server requests
|
|
||||||
- Model loading status
|
|
||||||
- Error diagnostics
|
|
||||||
- Performance data
|
|
||||||
|
|
||||||
### `/models/` - AI Model Storage
|
|
||||||
Where your downloaded models live. Each model includes:
|
|
||||||
- `model.gguf` - The actual AI model file
|
|
||||||
- `model.json` - Configuration and metadata
|
|
||||||
|
|
||||||
### `/threads/` - Chat History
|
|
||||||
Every conversation gets its own folder with:
|
|
||||||
- `messages.jsonl` - Complete chat history
|
|
||||||
- `thread.json` - Thread metadata and settings
|
|
||||||
|
|
||||||
### `/assistants/` - AI Personalities
|
|
||||||
Configuration files that define how your AI assistants behave, including their instructions and available tools.
|
|
||||||
|
|
||||||
## Privacy & Security
|
|
||||||
|
|
||||||
**Your data stays local:**
|
|
||||||
- No cloud backups or syncing
|
|
||||||
- Files stored in standard JSON/JSONL formats
|
|
||||||
- Complete control over your data
|
|
||||||
- Easy to backup or migrate
|
|
||||||
|
|
||||||
**File permissions:**
|
|
||||||
- Only you and Jan can access these files
|
|
||||||
- Standard user-level permissions
|
|
||||||
- No elevated access required
|
|
||||||
|
|
||||||
<Aside type="note">
|
|
||||||
When using cloud AI services through Jan, those conversations follow the cloud provider's data policies. Local model conversations never leave your computer.
|
|
||||||
</Aside>
|
|
||||||
|
|
||||||
## Common Tasks
|
|
||||||
|
|
||||||
### Backup Your Data
|
|
||||||
Copy the entire Jan data folder to backup:
|
|
||||||
- All chat history
|
|
||||||
- Model configurations
|
|
||||||
- Assistant settings
|
|
||||||
- Extension data
|
|
||||||
|
|
||||||
### Clear Chat History
|
|
||||||
Delete individual thread folders in `/threads/` or use Jan's interface to delete conversations.
|
|
||||||
|
|
||||||
### Export Conversations
|
|
||||||
Thread files are in standard JSON format - readable by any text editor or compatible with other applications.
|
|
||||||
|
|
||||||
### Troubleshooting Data Issues
|
|
||||||
1. Check `/logs/app.log` for error messages
|
|
||||||
2. Verify folder permissions
|
|
||||||
3. Ensure sufficient disk space
|
|
||||||
4. Restart Jan if files appear corrupted
|
|
||||||
|
|
||||||
## Uninstalling Jan
|
|
||||||
|
|
||||||
If you need to completely remove Jan and all data:
|
|
||||||
|
|
||||||
**Keep data (reinstall later):** Just uninstall the application
|
|
||||||
**Remove everything:** Delete the Jan data folder after uninstalling
|
|
||||||
|
|
||||||
Detailed uninstall guides:
|
|
||||||
- [macOS](/docs/desktop/mac#step-2-clean-up-data-optional)
|
|
||||||
- [Windows](/docs/desktop/windows#step-2-handle-jan-data)
|
|
||||||
- [Linux](/docs/desktop/linux#uninstall-jan)
|
|
||||||
@ -1,195 +1,107 @@
|
|||||||
---
|
---
|
||||||
title: Jan Local Server
|
title: Local API Server
|
||||||
description: Run Jan as a local AI server with OpenAI-compatible API for building AI applications.
|
description: Build AI applications with Jan's OpenAI-compatible API server.
|
||||||
---
|
---
|
||||||
|
|
||||||
import { Aside } from '@astrojs/starlight/components';
|
import { Aside } from '@astrojs/starlight/components';
|
||||||
|
|
||||||

|
Jan provides an OpenAI-compatible API server that runs entirely on your computer. Use the same API patterns you know from OpenAI, but with complete control over your models and data.
|
||||||
|
|
||||||
Jan Local Server provides an OpenAI-compatible API that runs entirely on your computer. Build AI applications using familiar API patterns while keeping complete control over your data and models.
|
## Features
|
||||||
|
|
||||||
## How It Works
|
- **OpenAI-compatible** - Drop-in replacement for OpenAI API
|
||||||
|
- **Local models** - Run GGUF models via llama.cpp
|
||||||
|
- **Cloud models** - Proxy to OpenAI, Anthropic, and others
|
||||||
|
- **Privacy-first** - Local models never send data externally
|
||||||
|
- **No vendor lock-in** - Switch between providers seamlessly
|
||||||
|
|
||||||
Jan runs a local server powered by [llama.cpp](https://github.com/ggerganov/llama.cpp) that provides an OpenAI-compatible API. By default, it runs at `https://localhost:1337` and works completely offline.
|
## Quick Start
|
||||||
|
|
||||||
**What this enables:**
|
Start the server in **Settings > Local API Server** and make requests to `http://localhost:1337/v1`:
|
||||||
- Connect development tools like [Continue](./continue-dev) and [Cline](https://cline.bot/) to Jan
|
|
||||||
- Build AI applications without cloud dependencies
|
|
||||||
- Use both local and cloud models through the same API
|
|
||||||
- Maintain full privacy for local model interactions
|
|
||||||
|
|
||||||
## Key Features
|
```bash
|
||||||
|
curl http://localhost:1337/v1/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "Authorization: Bearer YOUR_API_KEY" \
|
||||||
|
-d '{
|
||||||
|
"model": "MODEL_ID",
|
||||||
|
"messages": [{"role": "user", "content": "Hello!"}]
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
**Local AI Models**
|
## Documentation
|
||||||
- Download popular open-source models (Llama, Gemma, Qwen) from Hugging Face
|
|
||||||
- Import any GGUF files from your computer
|
|
||||||
- Run models completely offline
|
|
||||||
|
|
||||||
**Cloud Integration**
|
- [**API Configuration**](./api-server) - Server settings, authentication, CORS
|
||||||
- Connect to cloud services (OpenAI, Anthropic, Mistral, Groq)
|
- [**Engine Settings**](./llama-cpp) - Configure llama.cpp for your hardware
|
||||||
- Use your own API keys
|
- [**Server Settings**](./settings) - Advanced configuration options
|
||||||
- Switch between local and cloud models seamlessly
|
|
||||||
|
|
||||||
**Developer-Friendly**
|
## Integration Examples
|
||||||
- OpenAI-compatible API for easy integration
|
|
||||||
- Chat interface for testing and configuration
|
|
||||||
- Model parameter customization
|
|
||||||
|
|
||||||
**Complete Privacy**
|
### Continue (VS Code)
|
||||||
- All data stored locally
|
```json
|
||||||
- No cloud dependencies for local models
|
{
|
||||||
- You control what data leaves your machine
|
"models": [{
|
||||||
|
"title": "Jan",
|
||||||
|
"provider": "openai",
|
||||||
|
"baseURL": "http://localhost:1337/v1",
|
||||||
|
"apiKey": "YOUR_API_KEY",
|
||||||
|
"model": "MODEL_ID"
|
||||||
|
}]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
## Why Choose Jan?
|
### Python (OpenAI SDK)
|
||||||
|
```python
|
||||||
|
from openai import OpenAI
|
||||||
|
|
||||||
**Truly Open Source**
|
client = OpenAI(
|
||||||
- Apache 2.0 license - no restrictions
|
base_url="http://localhost:1337/v1",
|
||||||
- Community-driven development
|
api_key="YOUR_API_KEY"
|
||||||
- Full transparency
|
)
|
||||||
|
|
||||||
**Local-First Design**
|
response = client.chat.completions.create(
|
||||||
- Works 100% offline with local models
|
model="MODEL_ID",
|
||||||
- Data stays on your machine
|
messages=[{"role": "user", "content": "Hello!"}]
|
||||||
- No vendor lock-in
|
)
|
||||||
|
```
|
||||||
|
|
||||||
**Flexible Model Support**
|
### JavaScript/TypeScript
|
||||||
- Your choice of AI models
|
```javascript
|
||||||
- Both local and cloud options
|
const response = await fetch('http://localhost:1337/v1/chat/completions', {
|
||||||
- Easy model switching
|
method: 'POST',
|
||||||
|
headers: {
|
||||||
|
'Content-Type': 'application/json',
|
||||||
|
'Authorization': 'Bearer YOUR_API_KEY'
|
||||||
|
},
|
||||||
|
body: JSON.stringify({
|
||||||
|
model: 'MODEL_ID',
|
||||||
|
messages: [{ role: 'user', content: 'Hello!' }]
|
||||||
|
})
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
**No Data Collection**
|
## Supported Endpoints
|
||||||
- We don't collect or sell user data
|
|
||||||
- Local conversations stay local
|
| Endpoint | Description |
|
||||||
- [Read our Privacy Policy](./privacy)
|
|----------|-------------|
|
||||||
|
| `/v1/chat/completions` | Chat completions (streaming supported) |
|
||||||
|
| `/v1/models` | List available models |
|
||||||
|
| `/v1/models/{id}` | Get model information |
|
||||||
|
|
||||||
<Aside type="note">
|
<Aside type="note">
|
||||||
Jan follows [local-first principles](https://www.inkandswitch.com/local-first) - your data belongs to you and stays on your device.
|
Jan implements the core OpenAI API endpoints. Some advanced features like function calling depend on model capabilities.
|
||||||
</Aside>
|
</Aside>
|
||||||
|
|
||||||
## Philosophy
|
## Why Use Jan's API?
|
||||||
|
|
||||||
Jan is built to be **user-owned**. This means:
|
**Privacy** - Your data stays on your machine with local models
|
||||||
- **True open source** - Apache 2.0 license with no hidden restrictions
|
**Cost** - No API fees for local model usage
|
||||||
- **Local data storage** - following [local-first principles](https://www.inkandswitch.com/local-first)
|
**Control** - Choose your models, parameters, and hardware
|
||||||
- **Internet optional** - works completely offline
|
**Flexibility** - Mix local and cloud models as needed
|
||||||
- **Free choice** - use any AI models you want
|
|
||||||
- **No surveillance** - we don't collect or sell your data
|
|
||||||
|
|
||||||
Read more about our [philosophy](/about#philosophy).
|
## Related Resources
|
||||||
|
|
||||||
## Inspiration
|
- [Models Overview](/docs/jan/manage-models) - Available models
|
||||||
|
- [Data Storage](/docs/jan/data-folder) - Where Jan stores data
|
||||||
Jan draws inspiration from [Calm Computing](https://en.wikipedia.org/wiki/Calm_technology) and the Disappearing Computer - technology that works quietly in the background without demanding constant attention.
|
- [Troubleshooting](/docs/jan/troubleshooting) - Common issues
|
||||||
|
- [GitHub Repository](https://github.com/janhq/jan) - Source code
|
||||||
## Built With
|
|
||||||
|
|
||||||
Jan stands on the shoulders of excellent open-source projects:
|
|
||||||
- [llama.cpp](https://github.com/ggerganov/llama.cpp) - Local AI model inference
|
|
||||||
- [Scalar](https://github.com/scalar/scalar) - API documentation
|
|
||||||
|
|
||||||
## Frequently Asked Questions
|
|
||||||
|
|
||||||
## What is Jan?
|
|
||||||
|
|
||||||
Jan is a privacy-focused AI assistant that runs locally on your computer. It's an alternative to ChatGPT, Claude, and other cloud-based AI tools, with optional cloud AI support when you want it.
|
|
||||||
|
|
||||||
|
|
||||||
## How do I get started?
|
|
||||||
|
|
||||||
Download Jan, add a model (either download locally or add a cloud API key), and start chatting. Check our [Quick Start guide](/docs/quickstart) for detailed setup instructions.
|
|
||||||
|
|
||||||
|
|
||||||
## What systems does Jan support?
|
|
||||||
|
|
||||||
Jan works on all major operating systems:
|
|
||||||
- [macOS](/docs/desktop/mac#compatibility) - Intel and Apple Silicon
|
|
||||||
- [Windows](/docs/desktop/windows#compatibility) - x64 systems
|
|
||||||
- [Linux](/docs/desktop/linux) - Most distributions
|
|
||||||
|
|
||||||
Jan supports various hardware:
|
|
||||||
- NVIDIA GPUs (CUDA acceleration)
|
|
||||||
- AMD GPUs (Vulkan support)
|
|
||||||
- Intel Arc GPUs (Vulkan support)
|
|
||||||
- Any GPU with Vulkan support
|
|
||||||
- CPU-only operation
|
|
||||||
|
|
||||||
|
|
||||||
## How does Jan protect my privacy?
|
|
||||||
|
|
||||||
Jan prioritizes privacy through:
|
|
||||||
- **100% offline operation** with local models
|
|
||||||
- **Local data storage** - everything stays on your device
|
|
||||||
- **Open-source transparency** - you can verify what Jan does
|
|
||||||
- **No data collection** - we never see your conversations
|
|
||||||
|
|
||||||
<Aside type="caution">
|
|
||||||
When using cloud AI services, their privacy policies apply. Jan doesn't add any tracking.
|
|
||||||
</Aside>
|
|
||||||
|
|
||||||
All your files and chat history are stored locally in the [Jan Data Folder](./data-folder). See our complete [Privacy Policy](./privacy).
|
|
||||||
|
|
||||||
|
|
||||||
## What AI models can I use?
|
|
||||||
|
|
||||||
**Local models:**
|
|
||||||
- Download optimized models from the [Jan Hub](/docs/manage-models)
|
|
||||||
- Import GGUF models from Hugging Face
|
|
||||||
- Use any compatible local model files
|
|
||||||
|
|
||||||
**Cloud models:**
|
|
||||||
- OpenAI (GPT-4, ChatGPT)
|
|
||||||
- Anthropic (Claude)
|
|
||||||
- Mistral, Groq, and others
|
|
||||||
- Bring your own API keys
|
|
||||||
|
|
||||||
|
|
||||||
## Is Jan really free?
|
|
||||||
|
|
||||||
Yes! Jan is completely free and open-source with no subscription fees.
|
|
||||||
|
|
||||||
**What's free:**
|
|
||||||
- Jan application and all features
|
|
||||||
- Local model usage (once downloaded)
|
|
||||||
- Local server and API
|
|
||||||
|
|
||||||
**What costs money:**
|
|
||||||
- Cloud model usage (you pay providers directly)
|
|
||||||
- We add no markup to cloud service costs
|
|
||||||
|
|
||||||
|
|
||||||
## Can Jan work offline?
|
|
||||||
|
|
||||||
Absolutely! Once you download a local model, Jan works completely offline with no internet connection needed. This is one of Jan's core features.
|
|
||||||
|
|
||||||
|
|
||||||
## How can I get help or contribute?
|
|
||||||
|
|
||||||
**Get help:**
|
|
||||||
- Join our [Discord community](https://discord.gg/qSwXFx6Krr)
|
|
||||||
- Check the [Troubleshooting guide](./troubleshooting)
|
|
||||||
- Ask in [#🆘|jan-help](https://discord.com/channels/1107178041848909847/1192090449725358130)
|
|
||||||
|
|
||||||
**Contribute:**
|
|
||||||
- Contribute on [GitHub](https://github.com/menloresearch/jan)
|
|
||||||
- No permission needed to submit improvements
|
|
||||||
- Help other users in Discord
|
|
||||||
|
|
||||||
|
|
||||||
## Can I self-host Jan?
|
|
||||||
|
|
||||||
Yes! We fully support self-hosting. You can:
|
|
||||||
- Download Jan directly for personal use
|
|
||||||
- Fork the [GitHub repository](https://github.com/menloresearch/jan)
|
|
||||||
- Build from source
|
|
||||||
- Deploy on your own infrastructure
|
|
||||||
|
|
||||||
|
|
||||||
## What does 'Jan' stand for?
|
|
||||||
|
|
||||||
"Just a Name" - we admit we're not great at marketing! 😄
|
|
||||||
|
|
||||||
|
|
||||||
## Are you hiring?
|
|
||||||
|
|
||||||
Yes! We love hiring from our community. Check our open positions at [Careers](https://menlo.bamboohr.com/careers).
|
|
||||||
@ -1,6 +1,6 @@
|
|||||||
---
|
---
|
||||||
title: llama.cpp Engine
|
title: llama.cpp Engine
|
||||||
description: Configure Jan's local AI engine for optimal performance.
|
description: Configure Jan's local AI engine for optimal performance on your hardware.
|
||||||
keywords:
|
keywords:
|
||||||
[
|
[
|
||||||
Jan,
|
Jan,
|
||||||
@ -12,162 +12,377 @@ keywords:
|
|||||||
GPU acceleration,
|
GPU acceleration,
|
||||||
CPU processing,
|
CPU processing,
|
||||||
model optimization,
|
model optimization,
|
||||||
|
CUDA,
|
||||||
|
Metal,
|
||||||
|
Vulkan,
|
||||||
]
|
]
|
||||||
---
|
---
|
||||||
|
|
||||||
import { Aside, Tabs, TabItem } from '@astrojs/starlight/components'
|
import { Aside, Tabs, TabItem } from '@astrojs/starlight/components';
|
||||||
|
|
||||||
`llama.cpp` is the core **inference engine** Jan uses to run AI models locally on your computer. This section
|
## What is llama.cpp?
|
||||||
covers the settings for the engine itself, which control *how* a model processes information on your hardware.
|
|
||||||
|
|
||||||
<Aside>
|
llama.cpp is the core inference engine that powers Jan's ability to run AI models locally on your computer. Created by Georgi Gerganov, it's designed to run large language models efficiently on consumer hardware without requiring specialized AI accelerators or cloud connections.
|
||||||
Looking for API server settings (like port, host, CORS)? They have been moved to the dedicated
|
|
||||||
[**Local API Server**](/docs/local-server/api-server) page.
|
**Key benefits:**
|
||||||
</Aside>
|
- Run models entirely offline after download
|
||||||
|
- Use your existing hardware (CPU, GPU, or Apple Silicon)
|
||||||
|
- Complete privacy - conversations never leave your device
|
||||||
|
- No API costs or subscription fees
|
||||||
|
|
||||||
## Accessing Engine Settings
|
## Accessing Engine Settings
|
||||||
|
|
||||||
Find llama.cpp settings at **Settings** > **Local Engine** > **llama.cpp**:
|
Navigate to **Settings** > **Model Providers** > **Llama.cpp**:
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
<Aside type="note">
|
<Aside type="note">
|
||||||
Most users don't need to change these settings. Jan picks good defaults for your hardware automatically.
|
Most users don't need to change these settings. Jan automatically detects your hardware and picks optimal defaults.
|
||||||
</Aside>
|
</Aside>
|
||||||
|
|
||||||
## When to Adjust Settings
|
|
||||||
|
|
||||||
You might need to modify these settings if:
|
|
||||||
- Models load slowly or don't work
|
|
||||||
- You've installed new hardware (like a graphics card)
|
|
||||||
- You want to optimize performance for your specific setup
|
|
||||||
|
|
||||||
## Engine Management
|
## Engine Management
|
||||||
|
|
||||||
| Feature | What It Does | When You Need It |
|
| Feature | What It Does | When to Use |
|
||||||
|---------|-------------|------------------|
|
|---------|-------------|-------------|
|
||||||
| **Engine Version** | Shows current llama.cpp version | Check compatibility with newer models |
|
| **Engine Version** | Shows current llama.cpp version | Check when models require newer engine |
|
||||||
| **Check Updates** | Downloads engine updates | When new models require updated engine |
|
| **Check Updates** | Downloads latest engine | Update for new model support or bug fixes |
|
||||||
| **Backend Selection** | Choose hardware-optimized version | After hardware changes or performance issues |
|
| **Backend Selection** | Choose hardware-optimized version | After hardware changes or performance issues |
|
||||||
|
|
||||||
## Hardware Backends
|
## Selecting the Right Backend
|
||||||
|
|
||||||
Different backends are optimized for different hardware. Pick the one that matches your computer:
|
Different backends are optimized for specific hardware. Choose the one that matches your system:
|
||||||
|
|
||||||
<Tabs items={['Windows', 'Linux', 'macOS']}>
|
|
||||||
|
|
||||||
|
<Tabs>
|
||||||
<TabItem label="Windows">
|
<TabItem label="Windows">
|
||||||
|
|
||||||
### NVIDIA Graphics Cards (Fastest)
|
|
||||||
**For CUDA 12.0:**
|
|
||||||
- `llama.cpp-avx2-cuda-12-0` (most common)
|
|
||||||
- `llama.cpp-avx512-cuda-12-0` (newer Intel/AMD CPUs)
|
|
||||||
|
|
||||||
**For CUDA 11.7:**
|
|
||||||
- `llama.cpp-avx2-cuda-11-7` (older drivers)
|
|
||||||
|
|
||||||
### CPU Only
|
|
||||||
- `llama.cpp-avx2` (modern CPUs)
|
|
||||||
- `llama.cpp-avx` (older CPUs)
|
|
||||||
- `llama.cpp-noavx` (very old CPUs)
|
|
||||||
|
|
||||||
### Other Graphics Cards
|
|
||||||
- `llama.cpp-vulkan` (AMD, Intel Arc)
|
|
||||||
|
|
||||||
</TabItem>
|
|
||||||
|
|
||||||
<TabItem label="Linux">
|
|
||||||
|
|
||||||
### NVIDIA Graphics Cards
|
### NVIDIA Graphics Cards
|
||||||
- `llama.cpp-avx2-cuda-12-0` (recommended)
|
Check your CUDA version in NVIDIA Control Panel, then select:
|
||||||
- `llama.cpp-avx2-cuda-11-7` (older drivers)
|
|
||||||
|
**CUDA 12.0 (Most Common):**
|
||||||
|
- `llama.cpp-avx2-cuda-12-0` - Modern CPUs with AVX2
|
||||||
|
- `llama.cpp-avx512-cuda-12-0` - Newer Intel/AMD CPUs with AVX512
|
||||||
|
- `llama.cpp-avx-cuda-12-0` - Older CPUs without AVX2
|
||||||
|
|
||||||
|
**CUDA 11.7 (Older Drivers):**
|
||||||
|
- `llama.cpp-avx2-cuda-11-7` - Modern CPUs
|
||||||
|
- `llama.cpp-avx-cuda-11-7` - Older CPUs
|
||||||
|
|
||||||
### CPU Only
|
### CPU Only
|
||||||
- `llama.cpp-avx2` (modern CPUs)
|
- `llama.cpp-avx2` - Most modern CPUs (2013+)
|
||||||
- `llama.cpp-arm64` (ARM processors)
|
- `llama.cpp-avx512` - High-end Intel/AMD CPUs
|
||||||
|
- `llama.cpp-avx` - Older CPUs (2011-2013)
|
||||||
|
- `llama.cpp-noavx` - Very old CPUs (pre-2011)
|
||||||
|
|
||||||
### Other Graphics Cards
|
### AMD/Intel Graphics
|
||||||
- `llama.cpp-vulkan` (AMD, Intel graphics)
|
- `llama.cpp-vulkan` - AMD Radeon, Intel Arc, Intel integrated
|
||||||
|
|
||||||
|
<Aside type="tip">
|
||||||
|
Not sure? Start with `avx2` variants. If models fail to load, try `avx` versions.
|
||||||
|
</Aside>
|
||||||
|
|
||||||
</TabItem>
|
</TabItem>
|
||||||
|
|
||||||
<TabItem label="macOS">
|
<TabItem label="macOS">
|
||||||
|
|
||||||
### Apple Silicon (M1/M2/M3/M4)
|
### Apple Silicon (M1/M2/M3/M4)
|
||||||
- `llama.cpp-mac-arm64` (recommended)
|
- `llama.cpp-mac-arm64` - Automatically uses GPU acceleration via Metal
|
||||||
|
|
||||||
### Intel Macs
|
### Intel Macs
|
||||||
- `llama.cpp-mac-amd64`
|
- `llama.cpp-mac-amd64` - CPU-only processing
|
||||||
|
|
||||||
<Aside type="note">
|
<Aside type="note">
|
||||||
Apple Silicon automatically uses GPU acceleration through Metal.
|
Apple Silicon Macs get automatic GPU acceleration through Metal - no configuration needed.
|
||||||
</Aside>
|
</Aside>
|
||||||
|
|
||||||
</TabItem>
|
</TabItem>
|
||||||
|
|
||||||
|
<TabItem label="Linux">
|
||||||
|
|
||||||
|
### NVIDIA Graphics Cards
|
||||||
|
- `llama.cpp-avx2-cuda-12-0` - CUDA 12.0+ with modern CPU
|
||||||
|
- `llama.cpp-avx2-cuda-11-7` - CUDA 11.7+ with modern CPU
|
||||||
|
|
||||||
|
### CPU Only
|
||||||
|
- `llama.cpp-avx2` - x86_64 modern CPUs
|
||||||
|
- `llama.cpp-avx512` - High-end Intel/AMD CPUs
|
||||||
|
- `llama.cpp-arm64` - ARM processors (Raspberry Pi, etc.)
|
||||||
|
|
||||||
|
### AMD/Intel Graphics
|
||||||
|
- `llama.cpp-vulkan` - Open-source GPU acceleration
|
||||||
|
|
||||||
|
</TabItem>
|
||||||
</Tabs>
|
</Tabs>
|
||||||
|
|
||||||
## Performance Settings
|
## Performance Settings
|
||||||
|
|
||||||
| Setting | What It Does | Recommended | Impact |
|
Configure how the engine processes requests:
|
||||||
|---------|-------------|-------------|---------|
|
|
||||||
| **Continuous Batching** | Handle multiple requests simultaneously | Enabled | Faster when using tools or multiple chats |
|
|
||||||
| **Parallel Operations** | Number of concurrent requests | 4 | Higher = more multitasking, uses more memory |
|
|
||||||
| **CPU Threads** | Processor cores to use | Auto | More threads can speed up CPU processing |
|
|
||||||
|
|
||||||
## Memory Settings
|
### Core Performance
|
||||||
|
|
||||||
| Setting | What It Does | Recommended | When to Change |
|
| Setting | What It Does | Default | When to Adjust |
|
||||||
|---------|-------------|-------------|----------------|
|
|---------|-------------|---------|----------------|
|
||||||
| **Flash Attention** | Efficient memory usage | Enabled | Leave enabled unless problems occur |
|
| **Auto-update engine** | Automatically updates llama.cpp to latest version | Enabled | Disable if you need version stability |
|
||||||
| **Caching** | Remember recent conversations | Enabled | Speeds up follow-up questions |
|
| **Auto-Unload Old Models** | Frees memory by unloading unused models | Disabled | Enable if switching between many models |
|
||||||
| **KV Cache Type** | Memory vs quality trade-off | f16 | Change to q8_0 if low on memory |
|
| **Threads** | CPU cores for text generation (`-1` = all cores) | -1 | Reduce if you need CPU for other tasks |
|
||||||
| **mmap** | Efficient model loading | Enabled | Helps with large models |
|
| **Threads (Batch)** | CPU cores for batch processing | -1 | Usually matches Threads setting |
|
||||||
| **Context Shift** | Handle very long conversations | Disabled | Enable for very long chats |
|
| **Context Shift** | Removes old text to fit new text in memory | Disabled | Enable for very long conversations |
|
||||||
|
| **Max Tokens to Predict** | Maximum response length (`-1` = unlimited) | -1 | Set a limit to control response size |
|
||||||
|
|
||||||
### Memory Options Explained
|
**Simple Analogy:** Think of threads like workers in a factory. More workers (threads) means faster production, but if you need workers elsewhere (other programs), you might want to limit how many the factory uses.
|
||||||
- **f16**: Best quality, uses more memory
|
|
||||||
- **q8_0**: Balanced memory and quality
|
|
||||||
- **q4_0**: Least memory, slight quality reduction
|
|
||||||
|
|
||||||
## Quick Troubleshooting
|
### Batch Processing
|
||||||
|
|
||||||
**Models won't load:**
|
| Setting | What It Does | Default | When to Adjust |
|
||||||
- Try a different backend
|
|---------|-------------|---------|----------------|
|
||||||
- Check available RAM/VRAM
|
| **Batch Size** | Logical batch size for prompt processing | 2048 | Lower if you have memory issues |
|
||||||
- Update engine version
|
| **uBatch Size** | Physical batch size for hardware | 512 | Match your GPU's capabilities |
|
||||||
|
| **Continuous Batching** | Process multiple requests at once | Enabled | Keep enabled for efficiency |
|
||||||
|
|
||||||
**Slow performance:**
|
**Simple Analogy:** Batch size is like the size of a delivery truck. A bigger truck (batch) can carry more packages (tokens) at once, but needs a bigger garage (memory) and more fuel (processing power).
|
||||||
- Verify GPU acceleration is active
|
|
||||||
- Close memory-intensive applications
|
### Multi-GPU Settings
|
||||||
- Increase GPU Layers in model settings
|
|
||||||
|
| Setting | What It Does | Default | When to Adjust |
|
||||||
|
|---------|-------------|---------|----------------|
|
||||||
|
| **GPU Split Mode** | How to divide model across GPUs | Layer | Change only with multiple GPUs |
|
||||||
|
| **Main GPU Index** | Primary GPU for processing | 0 | Select different GPU if needed |
|
||||||
|
|
||||||
|
**When to tweak:** Only adjust if you have multiple GPUs and want to optimize how the model is distributed across them.
|
||||||
|
|
||||||
|
## Memory Configuration
|
||||||
|
|
||||||
|
Control how models use system and GPU memory:
|
||||||
|
|
||||||
|
### Memory Management
|
||||||
|
|
||||||
|
| Setting | What It Does | Default | When to Adjust |
|
||||||
|
|---------|-------------|---------|----------------|
|
||||||
|
| **Flash Attention** | Optimized memory usage for attention | Enabled | Disable only if having stability issues |
|
||||||
|
| **Disable mmap** | Turn off memory-mapped file loading | Disabled | Enable if experiencing crashes |
|
||||||
|
| **MLock** | Lock model in RAM (no swap to disk) | Disabled | Enable if you have plenty of RAM |
|
||||||
|
| **Disable KV Offload** | Keep conversation memory on CPU | Disabled | Enable if GPU memory is limited |
|
||||||
|
|
||||||
|
**Simple Analogy:** Think of your computer's memory like a desk workspace:
|
||||||
|
- **mmap** is like keeping reference books open to specific pages (efficient)
|
||||||
|
- **mlock** is like gluing papers to your desk so they can't fall off (uses more space but faster access)
|
||||||
|
- **Flash Attention** is like using sticky notes instead of full pages (saves space)
|
||||||
|
|
||||||
|
### KV Cache Configuration
|
||||||
|
|
||||||
|
| Setting | What It Does | Options | When to Adjust |
|
||||||
|
|---------|-------------|---------|----------------|
|
||||||
|
| **KV Cache K Type** | Precision for "keys" in memory | f16, q8_0, q4_0 | Lower precision saves memory |
|
||||||
|
| **KV Cache V Type** | Precision for "values" in memory | f16, q8_0, q4_0 | Lower precision saves memory |
|
||||||
|
| **KV Cache Defragmentation Threshold** | When to reorganize memory (0.1 = 10% fragmented) | 0.1 | Increase if seeing memory errors |
|
||||||
|
|
||||||
|
**Memory Precision Guide:**
|
||||||
|
- **f16** (default): Full quality, uses most memory - like HD video
|
||||||
|
- **q8_0**: Good quality, moderate memory - like standard video
|
||||||
|
- **q4_0**: Acceptable quality, least memory - like compressed video
|
||||||
|
|
||||||
|
**When to adjust:** Start with f16. If you run out of memory, try q8_0. Only use q4_0 if absolutely necessary.
|
||||||
|
|
||||||
|
## Advanced Settings
|
||||||
|
|
||||||
|
### RoPE (Rotary Position Embeddings)
|
||||||
|
|
||||||
|
| Setting | What It Does | Default | When to Adjust |
|
||||||
|
|---------|-------------|---------|----------------|
|
||||||
|
| **RoPE Scaling Method** | How to extend context length | None | For contexts beyond model's training |
|
||||||
|
| **RoPE Scale Factor** | Context extension multiplier | 1 | Increase for longer contexts |
|
||||||
|
| **RoPE Frequency Base** | Base frequency (0 = auto) | 0 | Leave at 0 unless specified |
|
||||||
|
| **RoPE Frequency Scale Factor** | Frequency adjustment | 1 | Advanced users only |
|
||||||
|
|
||||||
|
**Simple Analogy:** RoPE is like the model's sense of position in a conversation. Imagine reading a book:
|
||||||
|
- **Normal**: You remember where you are on the page
|
||||||
|
- **RoPE Scaling**: Like using a magnifying glass to fit more words on the same page
|
||||||
|
- Scaling too much can make the text (context) blurry (less accurate)
|
||||||
|
|
||||||
|
**When to use:** Only adjust if you need conversations longer than the model's default context length and understand the quality tradeoffs.
|
||||||
|
|
||||||
|
### Mirostat Sampling
|
||||||
|
|
||||||
|
| Setting | What It Does | Default | When to Adjust |
|
||||||
|
|---------|-------------|---------|----------------|
|
||||||
|
| **Mirostat Mode** | Alternative text generation method | Disabled | Try for more consistent output |
|
||||||
|
| **Mirostat Learning Rate** | How quickly it adapts (eta) | 0.1 | Lower = more stable |
|
||||||
|
| **Mirostat Target Entropy** | Target randomness (tau) | 5 | Lower = more focused |
|
||||||
|
|
||||||
|
**Simple Analogy:** Mirostat is like cruise control for text generation:
|
||||||
|
- **Regular sampling**: You manually control speed (randomness) with temperature
|
||||||
|
- **Mirostat**: Automatically adjusts to maintain consistent "speed" (perplexity)
|
||||||
|
- **Target Entropy**: Your desired cruising speed
|
||||||
|
- **Learning Rate**: How quickly the cruise control adjusts
|
||||||
|
|
||||||
|
**When to use:** Enable Mirostat if you find regular temperature settings produce inconsistent results. Start with defaults and adjust tau (3-7 range) for different styles.
|
||||||
|
|
||||||
|
### Structured Output
|
||||||
|
|
||||||
|
| Setting | What It Does | Default | When to Adjust |
|
||||||
|
|---------|-------------|---------|----------------|
|
||||||
|
| **Grammar File** | BNF grammar to constrain output | None | For specific output formats |
|
||||||
|
| **JSON Schema File** | JSON schema to enforce structure | None | For JSON responses |
|
||||||
|
|
||||||
|
**Simple Analogy:** These are like templates or forms the model must fill out:
|
||||||
|
- **Grammar**: Like Mad Libs - the model can only put words in specific places
|
||||||
|
- **JSON Schema**: Like a tax form - specific fields must be filled with specific types of data
|
||||||
|
|
||||||
|
**When to use:** Only when you need guaranteed structured output (like JSON for an API). Most users won't need these.
|
||||||
|
|
||||||
|
## Quick Optimization Guide
|
||||||
|
|
||||||
|
### For Best Performance
|
||||||
|
1. **Enable**: Flash Attention, Continuous Batching
|
||||||
|
2. **Set Threads**: -1 (use all CPU cores)
|
||||||
|
3. **Batch Size**: Keep defaults (2048/512)
|
||||||
|
|
||||||
|
### For Limited Memory
|
||||||
|
1. **Enable**: Auto-Unload Models, Flash Attention
|
||||||
|
2. **KV Cache**: Set both to q8_0 or q4_0
|
||||||
|
3. **Reduce**: Batch Size to 512/128
|
||||||
|
|
||||||
|
### For Long Conversations
|
||||||
|
1. **Enable**: Context Shift
|
||||||
|
2. **Consider**: RoPE scaling (with quality tradeoffs)
|
||||||
|
3. **Monitor**: Memory usage in System Monitor
|
||||||
|
|
||||||
|
### For Multiple Models
|
||||||
|
1. **Enable**: Auto-Unload Old Models
|
||||||
|
2. **Disable**: MLock (saves RAM)
|
||||||
|
3. **Use**: Default memory settings
|
||||||
|
|
||||||
|
## Troubleshooting Settings
|
||||||
|
|
||||||
|
**Model crashes or errors:**
|
||||||
|
- Disable mmap
|
||||||
|
- Reduce Batch Size
|
||||||
|
- Switch KV Cache to q8_0
|
||||||
|
|
||||||
**Out of memory:**
|
**Out of memory:**
|
||||||
- Change KV Cache Type to q8_0
|
- Enable Auto-Unload
|
||||||
- Reduce Context Size in model settings
|
- Reduce KV Cache precision
|
||||||
- Try a smaller model
|
- Lower Batch Size
|
||||||
|
|
||||||
**Crashes or errors:**
|
**Slow performance:**
|
||||||
- Switch to a more stable backend (avx instead of avx2)
|
- Check Threads = -1
|
||||||
- Update graphics drivers
|
- Enable Flash Attention
|
||||||
- Check system temperature
|
- Verify GPU backend is active
|
||||||
|
|
||||||
## Quick Setup Guide
|
**Inconsistent output:**
|
||||||
|
- Try Mirostat mode
|
||||||
|
- Adjust temperature in model settings
|
||||||
|
- Check if Context Shift is needed
|
||||||
|
|
||||||
**Most users:**
|
## Model-Specific Settings
|
||||||
1. Use default settings
|
|
||||||
2. Only change if problems occur
|
|
||||||
|
|
||||||
**NVIDIA GPU users:**
|
Each model can override engine defaults. Access via the gear icon next to any model:
|
||||||
1. Download CUDA backend
|
|
||||||
2. Ensure GPU Layers is set high
|
|
||||||
3. Enable Flash Attention
|
|
||||||
|
|
||||||
**Performance optimization:**
|

|
||||||
1. Enable Continuous Batching
|
|
||||||
2. Use appropriate backend for hardware
|
| Setting | What It Controls | Impact |
|
||||||
3. Monitor memory usage
|
|---------|-----------------|---------|
|
||||||
|
| **Context Length** | Conversation history size | Higher = more memory usage |
|
||||||
|
| **GPU Layers** | Model layers on GPU | Higher = faster but more VRAM |
|
||||||
|
| **Temperature** | Response randomness | 0.1 = focused, 1.0 = creative |
|
||||||
|
| **Top P** | Token selection pool | Lower = more focused responses |
|
||||||
|
|
||||||
|
<Aside type="tip">
|
||||||
|
Most users only need to adjust GPU Layers (for speed) and Temperature (for creativity). Leave other settings at defaults unless you have specific needs.
|
||||||
|
</Aside>
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Models Won't Load
|
||||||
|
1. **Wrong backend:** Try CPU-only backend first (`avx2` or `avx`)
|
||||||
|
2. **Insufficient memory:** Check RAM/VRAM requirements
|
||||||
|
3. **Outdated engine:** Update to latest version
|
||||||
|
4. **Corrupted download:** Re-download the model
|
||||||
|
|
||||||
|
### Slow Performance
|
||||||
|
1. **No GPU acceleration:** Verify correct CUDA/Vulkan backend
|
||||||
|
2. **Too few GPU layers:** Increase in model settings
|
||||||
|
3. **CPU bottleneck:** Check thread count matches cores
|
||||||
|
4. **Memory swapping:** Reduce context size or use smaller model
|
||||||
|
|
||||||
|
### Out of Memory
|
||||||
|
1. **Reduce quality:** Switch KV Cache to q8_0 or q4_0
|
||||||
|
2. **Lower context:** Decrease context length in model settings
|
||||||
|
3. **Fewer layers:** Reduce GPU layers
|
||||||
|
4. **Smaller model:** Use quantized versions (Q4 vs Q8)
|
||||||
|
|
||||||
|
### Crashes or Instability
|
||||||
|
1. **Backend mismatch:** Use more stable variant (avx vs avx2)
|
||||||
|
2. **Driver issues:** Update GPU drivers
|
||||||
|
3. **Overheating:** Monitor temperatures, improve cooling
|
||||||
|
4. **Power limits:** Check PSU capacity for high-end GPUs
|
||||||
|
|
||||||
|
## Performance Benchmarks
|
||||||
|
|
||||||
|
Typical performance with different configurations:
|
||||||
|
|
||||||
|
| Hardware | Model Size | Backend | Tokens/sec |
|
||||||
|
|----------|------------|---------|------------|
|
||||||
|
| RTX 4090 | 7B Q4 | CUDA 12 | 80-120 |
|
||||||
|
| RTX 3070 | 7B Q4 | CUDA 12 | 40-60 |
|
||||||
|
| M2 Pro | 7B Q4 | Metal | 30-50 |
|
||||||
|
| Ryzen 9 | 7B Q4 | AVX2 | 10-20 |
|
||||||
|
|
||||||
<Aside type="note">
|
<Aside type="note">
|
||||||
The default settings work well for most hardware. Only adjust these if you're experiencing specific issues or want to optimize for your particular setup.
|
Performance varies based on model, quantization, context size, and system configuration.
|
||||||
</Aside>
|
</Aside>
|
||||||
|
|
||||||
|
## Advanced Configuration
|
||||||
|
|
||||||
|
### Custom Compilation
|
||||||
|
|
||||||
|
For maximum performance, compile llama.cpp for your specific hardware:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Clone and build with specific optimizations
|
||||||
|
git clone https://github.com/ggerganov/llama.cpp
|
||||||
|
cd llama.cpp
|
||||||
|
|
||||||
|
# Examples for different systems
|
||||||
|
make LLAMA_CUDA=1 # NVIDIA GPUs
|
||||||
|
make LLAMA_METAL=1 # Apple Silicon
|
||||||
|
make LLAMA_VULKAN=1 # AMD/Intel GPUs
|
||||||
|
```
|
||||||
|
|
||||||
|
### Environment Variables
|
||||||
|
|
||||||
|
Fine-tune behavior with environment variables:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Force specific GPU
|
||||||
|
export CUDA_VISIBLE_DEVICES=0
|
||||||
|
|
||||||
|
# Thread tuning
|
||||||
|
export OMP_NUM_THREADS=8
|
||||||
|
|
||||||
|
# Memory limits
|
||||||
|
export GGML_CUDA_NO_PINNED=1
|
||||||
|
```
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
**For Beginners:**
|
||||||
|
1. Use default settings
|
||||||
|
2. Start with smaller models (3-7B parameters)
|
||||||
|
3. Enable GPU acceleration if available
|
||||||
|
|
||||||
|
**For Power Users:**
|
||||||
|
1. Match backend to hardware precisely
|
||||||
|
2. Tune memory settings for your VRAM
|
||||||
|
3. Experiment with parallel slots for multi-tasking
|
||||||
|
|
||||||
|
**For Developers:**
|
||||||
|
1. Enable verbose logging for debugging
|
||||||
|
2. Use consistent settings across deployments
|
||||||
|
3. Monitor resource usage during inference
|
||||||
|
|
||||||
|
## Related Resources
|
||||||
|
|
||||||
|
- [Model Parameters Guide](/docs/jan/explanation/model-parameters) - Fine-tune model behavior
|
||||||
|
- [Troubleshooting Guide](/docs/jan/troubleshooting) - Detailed problem-solving
|
||||||
|
- [Hardware Requirements](/docs/desktop/mac#compatibility) - System specifications
|
||||||
|
- [API Server Settings](./api-server) - Configure the local API
|
||||||
@ -1,220 +1,125 @@
|
|||||||
---
|
---
|
||||||
title: Settings
|
title: Server Settings
|
||||||
description: Configure Jan to work best for your needs and hardware.
|
description: Configure advanced server settings for Jan's local API.
|
||||||
keywords:
|
keywords:
|
||||||
[
|
[
|
||||||
Jan,
|
Jan,
|
||||||
|
local server,
|
||||||
settings,
|
settings,
|
||||||
configuration,
|
configuration,
|
||||||
model management,
|
API server,
|
||||||
privacy,
|
performance,
|
||||||
hardware settings,
|
logging,
|
||||||
local AI,
|
|
||||||
customization,
|
|
||||||
]
|
]
|
||||||
---
|
---
|
||||||
|
|
||||||
import { Aside, Steps } from '@astrojs/starlight/components'
|
import { Aside } from '@astrojs/starlight/components'
|
||||||
|
|
||||||
|
This page covers server-specific settings for Jan's local API. For general Jan settings, see the main [Settings Guide](/docs/jan/settings).
|
||||||
|
|
||||||
Access Jan's settings by clicking the Settings icon in the bottom left corner.
|
## Accessing Server Settings
|
||||||
|
|
||||||
## Managing AI Models
|
Navigate to **Settings** in Jan to configure server-related options.
|
||||||
|
|
||||||
Find all model options at **Settings** > **Model Providers**:
|
## Server Configuration
|
||||||
|
|
||||||
### Adding Models
|
### API Server Settings
|
||||||
|
|
||||||
**From Hugging Face:**
|
Configure the local API server at **Settings > Local API Server**:
|
||||||
- Enter a model's ID (like `microsoft/DialoGPT-medium`) in the search bar
|
|
||||||
- **Need authentication?** Some models require a Hugging Face token - add yours at **Settings > Model Providers > Hugging Face Access Token**
|
|
||||||
|
|
||||||
**From Your Computer:**
|
- **Host & Port** - Network binding configuration
|
||||||
- Click **Import Model** and select GGUF files from your computer
|
- **API Key** - Authentication for API requests
|
||||||
- Works with any compatible model files you've downloaded
|
- **CORS** - Cross-origin resource sharing
|
||||||
|
- **Verbose Logging** - Detailed request/response logs
|
||||||
|
|
||||||
### Managing Existing Models
|
See our [API Configuration Guide](./api-server) for complete details.
|
||||||
|
|
||||||
**Start a model:**
|
### Engine Configuration
|
||||||
1. Open a new chat and select the model you want
|
|
||||||
2. Or go to **Settings > Model Providers** and click the **Start** button
|
|
||||||
|
|
||||||
**Remove a model:**
|
Configure llama.cpp engine at **Settings > Model Providers > Llama.cpp**:
|
||||||
- Click the trash icon next to the **Start** button
|
|
||||||
- Confirm deletion when prompted
|
|
||||||
|
|
||||||
### Hugging Face Token Setup
|
- **Backend Selection** - Hardware-optimized versions
|
||||||
|
- **Performance Settings** - Batching, threading, memory
|
||||||
|
- **Model Defaults** - Context size, GPU layers
|
||||||
|
|
||||||
For restricted models (like Meta's Llama models):
|
See our [Engine Settings Guide](./llama-cpp) for optimization tips.
|
||||||
1. Get your token from [Hugging Face Tokens](https://huggingface.co/docs/hub/en/security-tokens)
|
|
||||||
2. Add it at **Settings > Model Providers > Hugging Face**
|
|
||||||
|
|
||||||
## Model Configuration (Gear Icon)
|
## Logging & Monitoring
|
||||||
|
|
||||||

|
### Server Logs
|
||||||
|
|
||||||
Click the gear icon next to any model to adjust how it behaves:
|
Monitor API activity in real-time:
|
||||||
|
|
||||||
**Basic Settings:**
|
1. Enable **Verbose Server Logs** in API settings
|
||||||
- **Context Size**: How much conversation history the model remembers
|
2. View logs at **System Monitor** > **App Log**
|
||||||
- **GPU Layers**: How much of the model runs on your graphics card (higher = faster, but uses more GPU memory)
|
3. Filter by `[SERVER]` tags for API-specific events
|
||||||
- **Temperature**: Controls creativity (0.1 = focused, 1.0 = creative)
|
|
||||||
|
|
||||||
**Advanced Controls:**
|
|
||||||
- **Top K & Top P**: Fine-tune how the model picks words (lower = more focused)
|
|
||||||
- **Min P**: Minimum probability threshold for word selection
|
|
||||||
- **Repeat Penalty**: Prevents the model from repeating itself too much
|
|
||||||
- **Presence Penalty**: Encourages the model to use varied vocabulary
|
|
||||||
|
|
||||||
<Aside type="note">
|
|
||||||
For detailed explanations of these parameters, see our [Model Parameters Guide](/docs/model-parameters).
|
|
||||||
</Aside>
|
|
||||||
|
|
||||||
## Hardware Monitoring
|
|
||||||
|
|
||||||
Check your computer's performance at **Settings** > **Hardware**:
|
|
||||||
|
|
||||||
- **CPU, RAM, GPU**: Real-time usage and specifications
|
|
||||||
- **GPU Acceleration**: Turn GPU acceleration on/off
|
|
||||||
- **Temperature monitoring**: Keep an eye on system heat
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
<Aside type="caution">
|
|
||||||
If your computer gets very hot, consider using smaller models or reducing GPU layers.
|
|
||||||
</Aside>
|
|
||||||
|
|
||||||
## Personalization
|
|
||||||
|
|
||||||
### Visual Appearance
|
|
||||||
|
|
||||||
Customize Jan's look at **Settings** > **Appearance**:
|
|
||||||
- **Theme**: Choose light or dark mode
|
|
||||||
- **Colors**: Pick your preferred color scheme
|
|
||||||
- **Code highlighting**: Adjust syntax colors for programming discussions
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
### Writing Assistance
|
|
||||||
|
|
||||||
**Spell Check:** Jan can help catch typing mistakes in your messages.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
## Privacy & Data Control
|
|
||||||
|
|
||||||
Access privacy settings at **Settings** > **Privacy**:
|
|
||||||
|
|
||||||
### Usage Analytics
|
|
||||||
|
|
||||||
**Default: No data collection.** Everything stays on your computer.
|
|
||||||
|
|
||||||
**Optional: Help improve Jan**
|
|
||||||
- Toggle **Analytics** to share anonymous usage patterns
|
|
||||||
- No conversations or personal data ever shared
|
|
||||||
- Change this setting anytime
|
|
||||||
|
|
||||||
<Aside type="note">
|
|
||||||
See exactly what we collect (with your permission) in our [Privacy Policy](/docs/privacy).
|
|
||||||
</Aside>
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
### Log Management
|
### Log Management
|
||||||
|
|
||||||
**Viewing System Logs:**
|
- **Location**: Stored in [Jan Data Folder](/docs/jan/data-folder)
|
||||||
- Logs help troubleshoot problems
|
- **Retention**: Automatically cleared after 24 hours
|
||||||
- Click the folder icon to open App Logs and System Logs
|
- **Manual Clear**: Settings > Advanced > Clear Logs
|
||||||
- Logs are automatically deleted after 24 hours
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
**Clearing Logs:**
|
|
||||||
- Click **Clear** to remove all log files immediately
|
|
||||||
- Useful before sharing your computer or troubleshooting
|
|
||||||
|
|
||||||
<Aside type="caution">
|
|
||||||
Clearing logs cannot be undone.
|
|
||||||
</Aside>
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
### Data Folder Management
|
|
||||||
|
|
||||||
Jan stores everything locally on your computer in standard file formats.
|
|
||||||
|
|
||||||
**Access Your Data:**
|
|
||||||
- Click the folder icon to open Jan's data directory
|
|
||||||
- Find your chat history, models, and settings
|
|
||||||
- All files are yours to backup, move, or examine
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
**Change Storage Location:**
|
|
||||||
1. Click the pencil icon to edit the data folder location
|
|
||||||
2. Choose an empty directory
|
|
||||||
3. Confirm the move (original folder stays intact)
|
|
||||||
4. Restart Jan to complete the change
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
<Aside type="note">
|
<Aside type="note">
|
||||||
This duplicates your data to the new location - your original files stay safe.
|
Enable verbose logging when debugging integrations or tracking API usage.
|
||||||
</Aside>
|
</Aside>
|
||||||
|
|
||||||
## Local API Server
|
## Performance Tuning
|
||||||
|
|
||||||
All settings for running Jan as a local, OpenAI-compatible server have been moved to their own dedicated page for clarity.
|
### Memory Management
|
||||||
|
|
||||||
This includes configuration for:
|
For optimal server performance:
|
||||||
- Server Host and Port
|
|
||||||
- API Keys
|
|
||||||
- CORS (Cross-Origin Resource Sharing)
|
|
||||||
- Verbose Logging
|
|
||||||
|
|
||||||
[**Go to Local API Server Settings →**](/docs/local-server/api-server)
|
- **High Traffic**: Increase parallel slots in engine settings
|
||||||
|
- **Limited RAM**: Reduce KV cache quality (q8_0 or q4_0)
|
||||||
|
- **Multiple Models**: Enable model unloading after idle timeout
|
||||||
|
|
||||||
## Emergency Options
|
### Network Configuration
|
||||||
|
|
||||||
### Factory Reset
|
Advanced networking options:
|
||||||
|
|
||||||
**When to use:** Only as a last resort for serious problems that other solutions can't fix.
|
- **Local Only**: Use `127.0.0.1` (default, most secure)
|
||||||
|
- **LAN Access**: Use `0.0.0.0` (allows network connections)
|
||||||
|
- **Custom Port**: Change from default `1337` if conflicts exist
|
||||||
|
|
||||||
**What it does:** Returns Jan to its original state - deletes everything.
|
## Security Considerations
|
||||||
|
|
||||||
**Steps:**
|
### API Authentication
|
||||||
1. Click **Reset** under "Reset to Factory Settings"
|
|
||||||
2. Type **RESET** to confirm you understand this deletes everything
|
|
||||||
3. Optionally keep your current data folder location
|
|
||||||
4. Click **Reset Now**
|
|
||||||
5. Restart Jan
|
|
||||||
|
|
||||||

|
- Always set a strong API key
|
||||||
|
- Rotate keys regularly for production use
|
||||||
|
- Never expose keys in client-side code
|
||||||
|
|
||||||

|
### Network Security
|
||||||
|
|
||||||
<Aside type="danger">
|
- Keep server on `localhost` unless LAN access is required
|
||||||
**This cannot be undone.** All chat history, downloaded models, and settings will be permanently deleted.
|
- Use firewall rules to restrict access
|
||||||
</Aside>
|
- Consider VPN for remote access needs
|
||||||
|
|
||||||
**Try these first:**
|
## Troubleshooting Server Issues
|
||||||
- Restart Jan
|
|
||||||
- Check the [Troubleshooting Guide](./troubleshooting)
|
|
||||||
- Ask for help on [Discord](https://discord.gg/qSwXFx6Krr)
|
|
||||||
|
|
||||||
## Quick Tips
|
### Common Problems
|
||||||
|
|
||||||
**For new users:**
|
**Server won't start:**
|
||||||
- Start with default settings
|
- Check port availability (`netstat -an | grep 1337`)
|
||||||
- Try a few different models to find what works best
|
- Verify no other instances running
|
||||||
- Enable GPU acceleration if you have a graphics card
|
- Try different port number
|
||||||
|
|
||||||
**For performance:**
|
**Connection refused:**
|
||||||
- Monitor hardware usage in real-time
|
- Ensure server is started
|
||||||
- Adjust GPU layers based on your graphics card memory
|
- Check host/port configuration
|
||||||
- Use smaller models on older hardware
|
- Verify firewall settings
|
||||||
|
|
||||||
**For privacy:**
|
**Authentication failures:**
|
||||||
- All data stays local by default
|
- Confirm API key matches configuration
|
||||||
- Check the data folder to see exactly what's stored
|
- Check Authorization header format
|
||||||
- Analytics are opt-in only
|
- Ensure no extra spaces in key
|
||||||
|
|
||||||
|
For more issues, see our [Troubleshooting Guide](/docs/jan/troubleshooting).
|
||||||
|
|
||||||
|
## Related Resources
|
||||||
|
|
||||||
|
- [API Configuration](./api-server) - Detailed API settings
|
||||||
|
- [Engine Settings](./llama-cpp) - Hardware optimization
|
||||||
|
- [Data Folder](/docs/jan/data-folder) - Storage locations
|
||||||
|
- [Models Overview](/docs/jan/manage-models) - Model management
|
||||||
@ -1,323 +0,0 @@
|
|||||||
---
|
|
||||||
title: Troubleshooting
|
|
||||||
description: Fix common issues and optimize Jan's performance with this comprehensive guide.
|
|
||||||
keywords:
|
|
||||||
[
|
|
||||||
Jan,
|
|
||||||
troubleshooting,
|
|
||||||
error fixes,
|
|
||||||
performance issues,
|
|
||||||
GPU problems,
|
|
||||||
installation issues,
|
|
||||||
common errors,
|
|
||||||
local AI,
|
|
||||||
technical support,
|
|
||||||
]
|
|
||||||
---
|
|
||||||
|
|
||||||
import { Aside, Steps, Tabs, TabItem } from '@astrojs/starlight/components'
|
|
||||||
|
|
||||||
## Getting Help: Error Logs
|
|
||||||
|
|
||||||
When Jan isn't working properly, error logs help identify the problem. Here's how to get them:
|
|
||||||
|
|
||||||
### Quick Access to Logs
|
|
||||||
|
|
||||||
**In Jan Interface:**
|
|
||||||
1. Look for **System Monitor** in the footer
|
|
||||||
2. Click **App Log**
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
**Via Terminal:**
|
|
||||||
```bash
|
|
||||||
# macOS/Linux
|
|
||||||
tail -n 50 ~/Library/Application\ Support/Jan/data/logs/app.log
|
|
||||||
|
|
||||||
# Windows
|
|
||||||
type %APPDATA%\Jan\data\logs\app.log
|
|
||||||
```
|
|
||||||
|
|
||||||
<Aside type="caution">
|
|
||||||
Remove any personal information before sharing logs. We only keep logs for 24 hours.
|
|
||||||
</Aside>
|
|
||||||
|
|
||||||
## Common Issues & Solutions
|
|
||||||
|
|
||||||
### Jan Won't Start (Broken Installation)
|
|
||||||
|
|
||||||
If Jan gets stuck after installation or won't start properly:
|
|
||||||
|
|
||||||
<Tabs>
|
|
||||||
<TabItem label="macOS">
|
|
||||||
|
|
||||||
**Clean Reinstall Steps:**
|
|
||||||
|
|
||||||
1. **Uninstall Jan** from Applications folder
|
|
||||||
|
|
||||||
2. **Delete all Jan data:**
|
|
||||||
```bash
|
|
||||||
rm -rf ~/Library/Application\ Support/Jan
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Kill any background processes** (for versions before 0.4.2):
|
|
||||||
```bash
|
|
||||||
ps aux | grep nitro
|
|
||||||
# Find process IDs and kill them:
|
|
||||||
kill -9 <PID>
|
|
||||||
```
|
|
||||||
|
|
||||||
4. **Download fresh copy** from [jan.ai](/download)
|
|
||||||
|
|
||||||
</TabItem>
|
|
||||||
|
|
||||||
<TabItem label="Windows">
|
|
||||||
|
|
||||||
**Clean Reinstall Steps:**
|
|
||||||
|
|
||||||
1. **Uninstall Jan** via Control Panel
|
|
||||||
|
|
||||||
2. **Delete application data:**
|
|
||||||
```cmd
|
|
||||||
cd C:\Users\%USERNAME%\AppData\Roaming
|
|
||||||
rmdir /S Jan
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Kill background processes** (for versions before 0.4.2):
|
|
||||||
```cmd
|
|
||||||
# Find nitro processes
|
|
||||||
tasklist | findstr "nitro"
|
|
||||||
# Kill them by PID
|
|
||||||
taskkill /F /PID <PID>
|
|
||||||
```
|
|
||||||
|
|
||||||
4. **Download fresh copy** from [jan.ai](/download)
|
|
||||||
|
|
||||||
</TabItem>
|
|
||||||
|
|
||||||
<TabItem label="Linux">
|
|
||||||
|
|
||||||
**Clean Reinstall Steps:**
|
|
||||||
|
|
||||||
1. **Uninstall Jan:**
|
|
||||||
```bash
|
|
||||||
# For Debian/Ubuntu
|
|
||||||
sudo apt-get remove jan
|
|
||||||
|
|
||||||
# For AppImage - just delete the file
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Delete application data:**
|
|
||||||
```bash
|
|
||||||
# Default location
|
|
||||||
rm -rf ~/.config/Jan
|
|
||||||
|
|
||||||
# Or custom location
|
|
||||||
rm -rf $XDG_CONFIG_HOME/Jan
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Kill background processes** (for versions before 0.4.2):
|
|
||||||
```bash
|
|
||||||
ps aux | grep nitro
|
|
||||||
kill -9 <PID>
|
|
||||||
```
|
|
||||||
|
|
||||||
4. **Download fresh copy** from [jan.ai](/download)
|
|
||||||
|
|
||||||
</TabItem>
|
|
||||||
</Tabs>
|
|
||||||
|
|
||||||
<Aside type="note">
|
|
||||||
Make sure Jan is completely removed from all user accounts before reinstalling.
|
|
||||||
</Aside>
|
|
||||||
|
|
||||||
### NVIDIA GPU Not Working
|
|
||||||
|
|
||||||
If Jan isn't using your NVIDIA graphics card for acceleration:
|
|
||||||
|
|
||||||
|
|
||||||
### Step 1: Check Your Hardware Setup
|
|
||||||
|
|
||||||
**Verify GPU Detection:**
|
|
||||||
|
|
||||||
*Windows:* Right-click desktop → NVIDIA Control Panel, or check Device Manager → Display Adapters
|
|
||||||
|
|
||||||
*Linux:* Run `lspci | grep -i nvidia`
|
|
||||||
|
|
||||||
**Install Required Software:**
|
|
||||||
|
|
||||||
**NVIDIA Driver (470.63.01 or newer):**
|
|
||||||
1. Download from [nvidia.com/drivers](https://www.nvidia.com/drivers/)
|
|
||||||
2. Test: Run `nvidia-smi` in terminal
|
|
||||||
|
|
||||||
**CUDA Toolkit (11.7 or newer):**
|
|
||||||
1. Download from [CUDA Downloads](https://developer.nvidia.com/cuda-downloads)
|
|
||||||
2. Test: Run `nvcc --version`
|
|
||||||
|
|
||||||
**Linux Additional Requirements:**
|
|
||||||
```bash
|
|
||||||
# Install required packages
|
|
||||||
sudo apt update && sudo apt install gcc-11 g++-11 cpp-11
|
|
||||||
|
|
||||||
# Set CUDA environment
|
|
||||||
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 2: Enable GPU Acceleration in Jan
|
|
||||||
|
|
||||||
1. Open **Settings** > **Hardware**
|
|
||||||
2. Turn on **GPU Acceleration**
|
|
||||||
3. Check **System Monitor** (footer) to verify GPU is detected
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
### Step 3: Verify Configuration
|
|
||||||
|
|
||||||
1. Go to **Settings** > **Advanced Settings** > **Data Folder**
|
|
||||||
2. Open `settings.json` file
|
|
||||||
3. Check these settings:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"run_mode": "gpu", // Should be "gpu"
|
|
||||||
"nvidia_driver": {
|
|
||||||
"exist": true, // Should be true
|
|
||||||
"version": "531.18"
|
|
||||||
},
|
|
||||||
"cuda": {
|
|
||||||
"exist": true, // Should be true
|
|
||||||
"version": "12"
|
|
||||||
},
|
|
||||||
"gpus": [
|
|
||||||
{
|
|
||||||
"id": "0",
|
|
||||||
"vram": "12282" // Your GPU memory in MB
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 4: Restart Jan
|
|
||||||
|
|
||||||
Close and restart Jan to apply changes.
|
|
||||||
|
|
||||||
#### Tested Working Configurations
|
|
||||||
|
|
||||||
**Desktop Systems:**
|
|
||||||
- Windows 11 + RTX 4070Ti + CUDA 12.2 + Driver 531.18
|
|
||||||
- Ubuntu 22.04 + RTX 4070Ti + CUDA 12.2 + Driver 545
|
|
||||||
|
|
||||||
**Virtual Machines:**
|
|
||||||
- Ubuntu on Proxmox + GTX 1660Ti + CUDA 12.1 + Driver 535
|
|
||||||
|
|
||||||
<Aside type="note">
|
|
||||||
Desktop installations perform better than virtual machines. VMs need proper GPU passthrough setup.
|
|
||||||
</Aside>
|
|
||||||
|
|
||||||
### "Failed to Fetch" or "Something's Amiss" Errors
|
|
||||||
|
|
||||||
When models won't respond or show these errors:
|
|
||||||
|
|
||||||
**1. Check System Requirements**
|
|
||||||
- **RAM:** Use models under 80% of available memory
|
|
||||||
- 8GB system: Use models under 6GB
|
|
||||||
- 16GB system: Use models under 13GB
|
|
||||||
- **Hardware:** Verify your system meets [minimum requirements](/docs/troubleshooting#step-1-verify-hardware-and-system-requirements)
|
|
||||||
|
|
||||||
**2. Adjust Model Settings**
|
|
||||||
- Open model settings in the chat sidebar
|
|
||||||
- Lower the **GPU Layers (ngl)** setting
|
|
||||||
- Start low and increase gradually
|
|
||||||
|
|
||||||
**3. Check Port Conflicts**
|
|
||||||
If logs show "Bind address failed":
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check if ports are in use
|
|
||||||
# macOS/Linux
|
|
||||||
netstat -an | grep 1337
|
|
||||||
|
|
||||||
# Windows
|
|
||||||
netstat -ano | find "1337"
|
|
||||||
```
|
|
||||||
|
|
||||||
**Default Jan ports:**
|
|
||||||
- API Server: `1337`
|
|
||||||
- Documentation: `3001`
|
|
||||||
|
|
||||||
**4. Try Factory Reset**
|
|
||||||
1. **Settings** > **Advanced Settings**
|
|
||||||
2. Click **Reset** under "Reset To Factory Settings"
|
|
||||||
|
|
||||||
<Aside type="caution">
|
|
||||||
This deletes all chat history, models, and settings.
|
|
||||||
</Aside>
|
|
||||||
|
|
||||||
**5. Clean Reinstall**
|
|
||||||
If problems persist, do a complete clean installation (see "Jan Won't Start" section above).
|
|
||||||
|
|
||||||
### Permission Denied Errors
|
|
||||||
|
|
||||||
If you see permission errors during installation:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Fix npm permissions (macOS/Linux)
|
|
||||||
sudo chown -R $(whoami) ~/.npm
|
|
||||||
|
|
||||||
# Windows - run as administrator
|
|
||||||
```
|
|
||||||
|
|
||||||
### OpenAI API Issues ("Unexpected Token")
|
|
||||||
|
|
||||||
For OpenAI connection problems:
|
|
||||||
|
|
||||||
**1. Verify API Key**
|
|
||||||
- Get valid key from [OpenAI Platform](https://platform.openai.com/)
|
|
||||||
- Ensure sufficient credits and permissions
|
|
||||||
|
|
||||||
**2. Check Regional Access**
|
|
||||||
- Some regions have API restrictions
|
|
||||||
- Try using a VPN from a supported region
|
|
||||||
- Test network connectivity to OpenAI endpoints
|
|
||||||
|
|
||||||
### Performance Issues
|
|
||||||
|
|
||||||
**Models Running Slowly:**
|
|
||||||
- Enable GPU acceleration (see NVIDIA section)
|
|
||||||
- Use appropriate model size for your hardware
|
|
||||||
- Close other memory-intensive applications
|
|
||||||
- Check Task Manager/Activity Monitor for resource usage
|
|
||||||
|
|
||||||
**High Memory Usage:**
|
|
||||||
- Switch to smaller model variants
|
|
||||||
- Reduce context length in model settings
|
|
||||||
- Enable model offloading in engine settings
|
|
||||||
|
|
||||||
**Frequent Crashes:**
|
|
||||||
- Update graphics drivers
|
|
||||||
- Check system temperature
|
|
||||||
- Reduce GPU layers if using GPU acceleration
|
|
||||||
- Verify adequate power supply (desktop systems)
|
|
||||||
|
|
||||||
## Need More Help?
|
|
||||||
|
|
||||||
If these solutions don't work:
|
|
||||||
|
|
||||||
**1. Gather Information:**
|
|
||||||
- Copy your error logs (see top of this page)
|
|
||||||
- Note your system specifications
|
|
||||||
- Describe what you were trying to do when the problem occurred
|
|
||||||
|
|
||||||
**2. Get Community Support:**
|
|
||||||
- Join our [Discord](https://discord.com/invite/FTk2MvZwJH)
|
|
||||||
- Post in the **#🆘|jan-help** channel
|
|
||||||
- Include your logs and system info
|
|
||||||
|
|
||||||
**3. Check Resources:**
|
|
||||||
- [System requirements](/docs/troubleshooting#step-1-verify-hardware-and-system-requirements)
|
|
||||||
- [Model compatibility guides](/docs/manage-models)
|
|
||||||
- [Hardware setup guides](/docs/desktop/)
|
|
||||||
|
|
||||||
<Aside type="note">
|
|
||||||
When sharing logs, remove personal information first. We only keep logs for 24 hours, so report issues promptly.
|
|
||||||
</Aside>
|
|
||||||
Loading…
x
Reference in New Issue
Block a user