diff --git a/docs/src/pages/docs/_assets/model-management-01.png b/docs/src/pages/docs/_assets/model-management-01.png
new file mode 100644
index 000000000..626322806
Binary files /dev/null and b/docs/src/pages/docs/_assets/model-management-01.png differ
diff --git a/docs/src/pages/docs/_assets/model-management-02.png b/docs/src/pages/docs/_assets/model-management-02.png
new file mode 100644
index 000000000..ca49ddda1
Binary files /dev/null and b/docs/src/pages/docs/_assets/model-management-02.png differ
diff --git a/docs/src/pages/docs/_assets/model-management-03.png b/docs/src/pages/docs/_assets/model-management-03.png
new file mode 100644
index 000000000..c31a04bd0
Binary files /dev/null and b/docs/src/pages/docs/_assets/model-management-03.png differ
diff --git a/docs/src/pages/docs/_assets/model-management-04.png b/docs/src/pages/docs/_assets/model-management-04.png
new file mode 100644
index 000000000..c3d5a58d4
Binary files /dev/null and b/docs/src/pages/docs/_assets/model-management-04.png differ
diff --git a/docs/src/pages/docs/_assets/model-management-05.png b/docs/src/pages/docs/_assets/model-management-05.png
new file mode 100644
index 000000000..07e89075b
Binary files /dev/null and b/docs/src/pages/docs/_assets/model-management-05.png differ
diff --git a/docs/src/pages/docs/_assets/model-management-06.png b/docs/src/pages/docs/_assets/model-management-06.png
new file mode 100644
index 000000000..20e6c4bc8
Binary files /dev/null and b/docs/src/pages/docs/_assets/model-management-06.png differ
diff --git a/docs/src/pages/docs/_assets/model-management-07.png b/docs/src/pages/docs/_assets/model-management-07.png
new file mode 100644
index 000000000..eee918837
Binary files /dev/null and b/docs/src/pages/docs/_assets/model-management-07.png differ
diff --git a/docs/src/pages/docs/models/manage-models.mdx b/docs/src/pages/docs/models/manage-models.mdx
index 63c005289..c5bfccee0 100644
--- a/docs/src/pages/docs/models/manage-models.mdx
+++ b/docs/src/pages/docs/models/manage-models.mdx
@@ -43,6 +43,10 @@ The Recommended List is a great starting point if you're looking for popular and
Jan will indicate if a model might be **Slow on your device** or requires **Not enough RAM** based on your system specifications.
+
+
+
+
#### 2. Import from [Hugging Face](https://huggingface.co/)
You can import GGUF models directly from [Hugging Face](https://huggingface.co/):
@@ -53,6 +57,10 @@ You can import GGUF models directly from [Hugging Face](https://huggingface.co/)
4. In Jan, paste the model ID/URL to **Search** bar in **Hub** or in **Settings** > **My Models**
5. Select your preferred quantized version to download
+
+
+
+
##### Option B: Use Deep Link
You can use Jan's deep link feature to quickly import models:
1. Visit [Hugging Face Models](https://huggingface.co/models).
@@ -69,11 +77,16 @@ jan://models/huggingface/TheBloke/Mistral-7B-v0.1-GGUF
6. A prompt will appear: `This site is trying to open Jan`, click **Open** to open Jan app.
7. Select your preferred quantized version to download
-
Deep linking won't work for models requiring API tokens or usage agreements. You'll need to download these models manually through the Hugging Face website.
+
+
+
+
+
+
#### 3. Import Local Files
If you already have GGUF model files on your computer:
1. In Jan, go to **Hub** or **Settings** > **My Models**
@@ -84,10 +97,14 @@ If you already have GGUF model files on your computer:
- **Duplicate:** Makes a copy of model files in Jan's directory
5. Click **Import** to complete
-
+
You need to own your **model configurations**, use at your own risk. Misconfigurations may result in lower quality or unexpected outputs.
+
+
+
+
#### 4. Manual Setup
For advanced users who add a specific model that is not available within Jan **Hub**:
1. Navigate to `~/jan/data/models/`
@@ -129,11 +146,29 @@ Key fields to configure:
### Delete Models
1. Go to **Settings** > **My Models**
2. Find the model you want to remove
-3. Select the three dots next to it and select **Delete model**.
+3. Select the three dots next to it and select **Delete Model**.
+
+
+
+
+
## Cloud model
+
+When using cloud models, be aware of any associated costs and rate limits from the providers.
+
+
Jan supports connecting to various AI cloud providers that are OpenAI API-compatible, including: OpenAI (GPT-4, o1,...), Anthropic (Claude), Groq, Mistral, and more.
1. Open **Settings**
2. Under **Model Provider** section in left sidebar (OpenAI, Anthropic, etc.), choose a provider
3. Enter your API key
-4. The activated cloud models will available in your model selection dropdown
\ No newline at end of file
+4. The activated cloud models will be available in your model selector in **Threads**
+
+
+
+
+
+Cloud models cannot be deleted, but you can hide them by disabling their respective provider extensions in **Settings** > **Engines**.
+
+
+
\ No newline at end of file
diff --git a/docs/src/pages/docs/models/model-parameters.mdx b/docs/src/pages/docs/models/model-parameters.mdx
index 855cc990e..6e4dbba54 100644
--- a/docs/src/pages/docs/models/model-parameters.mdx
+++ b/docs/src/pages/docs/models/model-parameters.mdx
@@ -19,51 +19,45 @@ keywords:
---
import { Callout, Steps } from 'nextra/components'
-## Model Parameters
-A model has three main parameters to configure:
-- Inference Parameters
-- Model Parameters
-- Engine Parameters
+# Model Parameters
+To customize model settings for a conversation:
+1. In any **Threads**, click **Model** tab in the **right sidebar**
+2. You can customize the following parameter types:
+- **Inference Parameters:** Control how the model generates responses
+- **Model Parameters:** Define the model's core properties and capabilities
+- **Engine Parameters:** Configure how the model runs on your hardware
+
+
### Inference Parameters
-Inference parameters are settings that control how an AI model generates outputs. These parameters include the following:
-| Parameter | Description |
+These settings determine how the model generates and formats its outputs.
+| Parameter | Description |
|---------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| **Temperature** | - Influences the randomness of the model's output.
- A higher temperature leads to more random and diverse responses, while a lower temperature produces more predictable outputs. |
-| **Top P** | - Sets a probability threshold, allowing only the most likely tokens whose cumulative probability exceeds the threshold to be considered for generation.
- A lower top-P value (e.g., 0.9) may be more suitable for focused, task-oriented applications, while a higher top-P value (e.g., 0.95 or 0.97) may be better for more open-ended, creative tasks. |
-| **Stream** | - Enables real-time data processing, which is useful for applications needing immediate responses, like live interactions. It accelerates predictions by processing data as it becomes available.
- Turned on by default. |
-| **Max Tokens** | - Sets the upper limit on the number of tokens the model can generate in a single output.
- A higher limit benefits detailed and complex responses, while a lower limit helps maintain conciseness.|
-| **Stop Sequences** | - Defines specific tokens or phrases that signal the model to stop producing further output.
- Use common concluding phrases or tokens specific to your application’s domain to ensure outputs terminate appropriately. |
-| **Frequency Penalty** | - Modifies the likelihood of the model repeating the same words or phrases within a single output, reducing redundancy in the generated text.
- Increase the penalty to avoid repetition in scenarios where varied language is preferred, such as creative writing or content generation.|
-| **Presence Penalty** | - Encourages the generation of new and varied concepts by penalizing tokens that have already appeared, promoting diversity and novelty in the output.
- Use a higher penalty for tasks requiring high novelty and variety, such as brainstorming or ideation sessions.|
+| **Temperature** | - Controls response randomness.
- Lower values (0.0-0.5) give focused, deterministic outputs. Higher values (0.8-2.0) produce more creative, varied responses. |
+| **Top P** | - Sets the cumulative probability threshold for token selection.
- Lower values (0.1-0.7) make responses more focused and conservative. Higher values (0.8-1.0) allow more diverse word choices.|
+| **Stream** | - Enables real-time response streaming. |
+| **Max Tokens** | - Limits the length of the model's response.
- A higher limit benefits detailed and complex responses, while a lower limit helps maintain conciseness.|
+| **Stop Sequences** | - Defines tokens or phrases that will end the model's response.
- Use common concluding phrases or tokens specific to your application’s domain to ensure outputs terminate appropriately. |
+| **Frequency Penalty** | - Reduces word repetition.
- Higher values (0.5-2.0) encourage more varied language. Useful for creative writing and content generation.|
+| **Presence Penalty** | - Encourages the model to explore new topics.
- Higher values (0.5-2.0) help prevent the model from fixating on already-discussed subjects.|
-### Model Parameter
-Model parameters are the settings that define and configure the model's behavior. These parameters include the following:
+
+
+
+### Model Parameters
+This setting defines and configures the model's behavior.
| Parameter | Description |
|---------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| **Prompt Template** | - This predefined text or framework generates responses or predictions. It is a structured guide that the AI model fills in or expands upon during the generation process.
- For example, a prompt template might include placeholders or specific instructions that direct how the model should formulate its outputs. |
+| **Prompt Template** | A structured format that guides how the model should respond. Contains **placeholders** and **instructions** that help shape the model's output in a consistent way.|
### Engine Parameters
-Engine parameters are the settings that define how the model processes input data and generates output. These parameters include the following:
+These settings parameters control how the model runs on your hardware.
| Parameter | Description |
|---------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| **Number of GPU Layers (ngl)** | - This parameter specifies the number of transformer layers in the model that are offloaded to the GPU for accelerated computation. Utilizing the GPU for these layers can significantly reduce inference time due to the parallel processing capabilities of GPUs.
- Adjusting this parameter can help balance between computational load on the GPU and CPU, potentially improving performance for different deployment scenarios. |
-| **Context Length** | - This parameter determines the maximum input amount the model can generate responses. The maximum context length varies with the model used. This setting is crucial for the model’s ability to produce coherent and contextually appropriate outputs.
- For tasks like summarizing long documents that require extensive context, use a higher context length. A lower setting can quicken response times and lessen computational demand for simpler queries or brief interactions. |
+| **Number of GPU Layers (ngl)** | - Controls how many layers of the model run on your GPU.
- More layers on GPU generally means faster processing, but requires more GPU memory.|
+| **Context Length** | - Controls how much text the model can consider at once.
- Longer context allows the model to handle more input but uses more memory and runs slower.
- The maximum context length varies with the model used.
|
-By default, Jan sets the **Context Length** to the maximum supported by your model, which may slow down response times. For lower-spec devices, reduce **Context Length** to **1024** or **2048**, depending on your device's specifications, to improve speed.
+By default, Jan defaults to the minimum between **8192** and the model's maximum context length, you can adjust this based on your needs.
-## Customize the Model Settings
-Adjust model settings for a specific conversation:
-
-1. Navigate to a **thread**.
-2. Click the **Model** tab.
-
-
-3. You can customize the following parameters:
- - Inference parameters
- - Model parameters
- - Engine parameters
-
-