Updated Model Management page

2025-01-01 23:27:11 +07:00 · 2025-01-01 23:27:11 +07:00 · bfbf35106e
commit bfbf35106e
parent 724c9a8897
9 changed files with 67 additions and 38 deletions
--- a/docs/src/pages/docs/_assets/model-management-01.png
+++ b/docs/src/pages/docs/_assets/model-management-01.png
--- a/docs/src/pages/docs/_assets/model-management-02.png
+++ b/docs/src/pages/docs/_assets/model-management-02.png
--- a/docs/src/pages/docs/_assets/model-management-03.png
+++ b/docs/src/pages/docs/_assets/model-management-03.png
--- a/docs/src/pages/docs/_assets/model-management-04.png
+++ b/docs/src/pages/docs/_assets/model-management-04.png
--- a/docs/src/pages/docs/_assets/model-management-05.png
+++ b/docs/src/pages/docs/_assets/model-management-05.png
--- a/docs/src/pages/docs/_assets/model-management-06.png
+++ b/docs/src/pages/docs/_assets/model-management-06.png
--- a/docs/src/pages/docs/_assets/model-management-07.png
+++ b/docs/src/pages/docs/_assets/model-management-07.png
--- a/docs/src/pages/docs/models/manage-models.mdx
+++ b/docs/src/pages/docs/models/manage-models.mdx
@ -43,6 +43,10 @@ The Recommended List is a great starting point if you're looking for popular and
 Jan will indicate if a model might be **Slow on your device** or requires **Not enough RAM** based on your system specifications.
 </Callout>

+<br/>
+![Download Model](../_assets/model-management-01.png)
+<br/>
+
 #### 2. Import from [Hugging Face](https://huggingface.co/) 
 You can import GGUF models directly from [Hugging Face](https://huggingface.co/):

@ -53,6 +57,10 @@ You can import GGUF models directly from [Hugging Face](https://huggingface.co/)
 4. In Jan, paste the model ID/URL to **Search** bar in **Hub** or in **Settings** > **My Models**
 5. Select your preferred quantized version to download

+<br/>
+![Download Model](../_assets/model-management-02.png)
+<br/>
+
 ##### Option B: Use Deep Link
 You can use Jan's deep link feature to quickly import models:
 1. Visit [Hugging Face Models](https://huggingface.co/models).
@ -69,11 +77,16 @@ jan://models/huggingface/TheBloke/Mistral-7B-v0.1-GGUF
 6. A prompt will appear: `This site is trying to open Jan`, click **Open** to open Jan app.
 7. Select your preferred quantized version to download

-
 <Callout type="warning">
 Deep linking won't work for models requiring API tokens or usage agreements. You'll need to download these models manually through the Hugging Face website.
 </Callout>

+<br/>
+![Downlosd Model](../_assets/model-management-03.png)
+<br/>
+
+
+
 #### 3. Import Local Files
 If you already have GGUF model files on your computer:
 1. In Jan, go to **Hub** or **Settings** > **My Models**
@ -84,10 +97,14 @@ If you already have GGUF model files on your computer:
  - **Duplicate:** Makes a copy of model files in Jan's directory
 5. Click **Import** to complete

-<Callout type="info">
+<Callout type="warning">
 You need to own your **model configurations**, use at your own risk. Misconfigurations may result in lower quality or unexpected outputs. 
 </Callout>

+<br/>
+![Download Model](../_assets/model-management-04.png)
+<br/>
+
 #### 4. Manual Setup
 For advanced users who add a specific model that is not available within Jan **Hub**:
 1. Navigate to `~/jan/data/models/`
@ -129,11 +146,29 @@ Key fields to configure:
 ### Delete Models
 1. Go to **Settings** > **My Models**
 2. Find the model you want to remove
-3. Select the three dots next to it and select **Delete model**.
+3. Select the three dots next to it and select **Delete Model**.
+
+<br/>
+![Delete Model](../_assets/model-management-05.png)
+<br/>
+

 ## Cloud model
+<Callout type="info">
+When using cloud models, be aware of any associated costs and rate limits from the providers.
+</Callout>
+
 Jan supports connecting to various AI cloud providers that are OpenAI API-compatible, including: OpenAI (GPT-4, o1,...), Anthropic (Claude), Groq, Mistral, and more.
 1. Open **Settings**
 2. Under **Model Provider** section in left sidebar (OpenAI, Anthropic, etc.), choose a provider
 3. Enter your API key
-4. The activated cloud models will available in your model selection dropdown
+4. The activated cloud models will be available in your model selector in **Threads**
+
+<br/>
+![Download Model](../_assets/model-management-06.png)
+<br/>
+
+Cloud models cannot be deleted, but you can hide them by disabling their respective provider extensions in **Settings** > **Engines**.
+<br/>
+![Download Model](../_assets/model-management-07.png)
+<br/>
--- a/docs/src/pages/docs/models/model-parameters.mdx
+++ b/docs/src/pages/docs/models/model-parameters.mdx
@ -19,51 +19,45 @@ keywords:
 ---
 import { Callout, Steps } from 'nextra/components' 

-## Model Parameters
-A model has three main parameters to configure:
- Inference Parameters
- Model Parameters
- Engine Parameters
+# Model Parameters
+To customize model settings for a conversation:
+1. In any **Threads**, click **Model** tab in the **right sidebar**
+2. You can customize the following parameter types:
+- **Inference Parameters:** Control how the model generates responses
+- **Model Parameters:** Define the model's core properties and capabilities
+- **Engine Parameters:** Configure how the model runs on your hardware
+
+

 ### Inference Parameters
-Inference parameters are settings that control how an AI model generates outputs. These parameters include the following:
-| Parameter           | Description                                                                                                                                                                  |
+These settings determine how the model generates and formats its outputs. 
+| Parameter           | Description                                                                                                                                                               |
 |---------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| **Temperature**     | - Influences the randomness of the model's output.<br></br>- A higher temperature leads to more random and diverse responses, while a lower temperature produces more predictable outputs. |
-| **Top P** | - Sets a probability threshold, allowing only the most likely tokens whose cumulative probability exceeds the threshold to be considered for generation.<br></br>- A lower top-P value (e.g., 0.9) may be more suitable for focused, task-oriented applications, while a higher top-P value (e.g., 0.95 or 0.97) may be better for more open-ended, creative tasks. |
-| **Stream**          | - Enables real-time data processing, which is useful for applications needing immediate responses, like live interactions. It accelerates predictions by processing data as it becomes available.<br></br>- Turned on by default. |
-| **Max Tokens**      | - Sets the upper limit on the number of tokens the model can generate in a single output.<br></br>- A higher limit benefits detailed and complex responses, while a lower limit helps maintain conciseness.|
-| **Stop Sequences**  | - Defines specific tokens or phrases that signal the model to stop producing further output.<br></br>- Use common concluding phrases or tokens specific to your application’s domain to ensure outputs terminate appropriately.         |
-| **Frequency Penalty** | - Modifies the likelihood of the model repeating the same words or phrases within a single output, reducing redundancy in the generated text.<br></br>- Increase the penalty to avoid repetition in scenarios where varied language is preferred, such as creative writing or content generation.|
-| **Presence Penalty** | - Encourages the generation of new and varied concepts by penalizing tokens that have already appeared, promoting diversity and novelty in the output.<br></br>- Use a higher penalty for tasks requiring high novelty and variety, such as brainstorming or ideation sessions.|
+| **Temperature**     | - Controls response randomness.<br></br>- Lower values (0.0-0.5) give focused, deterministic outputs. Higher values (0.8-2.0) produce more creative, varied responses. |
+| **Top P** | - Sets the cumulative probability threshold for token selection.<br></br>- Lower values (0.1-0.7) make responses more focused and conservative. Higher values (0.8-1.0) allow more diverse word choices.|
+| **Stream**          | - Enables real-time response streaming. |
+| **Max Tokens**      | - Limits the length of the model's response.<br></br>- A higher limit benefits detailed and complex responses, while a lower limit helps maintain conciseness.|
+| **Stop Sequences**  | - Defines tokens or phrases that will end the model's response.<br></br>- Use common concluding phrases or tokens specific to your application’s domain to ensure outputs terminate appropriately.         |
+| **Frequency Penalty** | - Reduces word repetition.<br></br>- Higher values (0.5-2.0) encourage more varied language. Useful for creative writing and content generation.|
+| **Presence Penalty** | - Encourages the model to explore new topics.<br></br>- Higher values (0.5-2.0) help prevent the model from fixating on already-discussed subjects.|

-### Model Parameter
-Model parameters are the settings that define and configure the model's behavior. These parameters include the following:
+
+
+
+### Model Parameters
+This setting defines and configures the model's behavior. 
 | Parameter           | Description                                                                                                                                                                  |
 |---------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| **Prompt Template**     | - This predefined text or framework generates responses or predictions. It is a structured guide that the AI model fills in or expands upon during the generation process.<br></br>- For example, a prompt template might include placeholders or specific instructions that direct how the model should formulate its outputs. |
+| **Prompt Template**     | A structured format that guides how the model should respond. Contains **placeholders** and **instructions** that help shape the model's output in a consistent way.|

 ### Engine Parameters
-Engine parameters are the settings that define how the model processes input data and generates output. These parameters include the following:
+These settings parameters control how the model runs on your hardware.
 | Parameter           | Description                                                                                                                                                                  |
 |---------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| **Number of GPU Layers (ngl)** | - This parameter specifies the number of transformer layers in the model that are offloaded to the GPU for accelerated computation. Utilizing the GPU for these layers can significantly reduce inference time due to the parallel processing capabilities of GPUs.<br></br>- Adjusting this parameter can help balance between computational load on the GPU and CPU, potentially improving performance for different deployment scenarios. |
-| **Context Length**  | - This parameter determines the maximum input amount the model can generate responses. The maximum context length varies with the model used. This setting is crucial for the model’s ability to produce coherent and contextually appropriate outputs.<br></br>- For tasks like summarizing long documents that require extensive context, use a higher context length. A lower setting can quicken response times and lessen computational demand for simpler queries or brief interactions. |
+| **Number of GPU Layers (ngl)** | - Controls how many layers of the model run on your GPU.<br></br>- More layers on GPU generally means faster processing, but requires more GPU memory.|
+| **Context Length**  | - Controls how much text the model can consider at once.<br></br>- Longer context allows the model to handle more input but uses more memory and runs slower.<br></br>- The maximum context length varies with the model used.<br></br>|

 <Callout type="info">
-By default, Jan sets the **Context Length** to the maximum supported by your model, which may slow down response times. For lower-spec devices, reduce **Context Length** to **1024** or **2048**, depending on your device's specifications, to improve speed.
+By default, Jan defaults to the minimum between **8192** and the model's maximum context length, you can adjust this based on your needs. 
 </Callout>

-## Customize the Model Settings
-Adjust model settings for a specific conversation:
-
-1. Navigate to a **thread**.
-2. Click the **Model** tab.
-<br/>
-![Specific Conversation](../_assets/model-tab.png)
-3. You can customize the following parameters:
-  - Inference parameters
-  - Model parameters
-  - Engine parameters
-<br/>
-![Specific Conversation](../_assets/model-parameters.png)