Updated Engines pages

2025-01-08 14:24:58 +07:00 · 2025-01-08 14:24:58 +07:00 · 500529b410
commit 500529b410
parent a197dc290d
9 changed files with 117 additions and 62 deletions
--- a/docs/src/pages/docs/_assets/extensions-09.png
+++ b/docs/src/pages/docs/_assets/extensions-09.png
--- a/docs/src/pages/docs/_assets/install-engines-01.png
+++ b/docs/src/pages/docs/_assets/install-engines-01.png
--- a/docs/src/pages/docs/_assets/install-engines-02.png
+++ b/docs/src/pages/docs/_assets/install-engines-02.png
--- a/docs/src/pages/docs/_assets/install-engines-03.png
+++ b/docs/src/pages/docs/_assets/install-engines-03.png
--- a/docs/src/pages/docs/_assets/tensorrt-llm-01.png
+++ b/docs/src/pages/docs/_assets/tensorrt-llm-01.png
--- a/docs/src/pages/docs/_assets/tensorrt-llm-02.png
+++ b/docs/src/pages/docs/_assets/tensorrt-llm-02.png
--- a/docs/src/pages/docs/install-engines.mdx
+++ b/docs/src/pages/docs/install-engines.mdx
@ -32,8 +32,17 @@ To add a new remote engine:

 1. Navigate to **Settings** (<Settings width={16} height={16} style={{display:"inline"}}/>) > **Engines**
 1. At **Remote Engine** category, click **+ Install Engine** 
+
+<br/>
+![Install Remote Engines](./_assets/install-engines-01.png)
+<br/>
+
 2. Fill in the following required information:

+<br/>
+![Install Remote Engines](./_assets/install-engines-02.png)
+<br/>
+
 | Field | Description | Required |
 |-------|-------------|----------|
 | Engine Name | Name for your engine (e.g., "OpenAI", "Claude") | ✓ |
@ -48,7 +57,14 @@ To add a new remote engine:
 > - The conversion functions are only needed for providers that don't follow the OpenAI API format. For OpenAI-compatible APIs, you can leave these empty.
 > - For OpenAI-compatible APIs like OpenAI, Anthropic, or Groq, you only need to fill in the required fields. Leave optional fields empty.

-4. Click **Install** to complete
+4. Click **Install** 
+5. Once completed, you should see your engine in **Engines** page:
+    - You can rename or uninstall your engine
+    - You can navigate to its own settings page
+
+<br/>
+![Install Remote Engines](./_assets/install-engines-03.png)
+<br/>

 ### Examples
 #### OpenAI-Compatible Setup
@ -89,6 +105,9 @@ API Key: your_api_key_here
 ```

 **Conversion Functions:**
+> - Request: Convert from Jan's OpenAI-style format to your API's format
+> - Response: Convert from your API's format back to OpenAI-style format
+
 1. Request Format Conversion:
 ```javascript
 function convertRequest(janRequest) {
@ -117,11 +136,6 @@ function convertResponse(apiResponse) {
 }
 ```

-<Callout type="info">
-The conversion functions should:
- Request: Convert from Jan's OpenAI-style format to your API's format
- Response: Convert from your API's format back to OpenAI-style format
-</Callout>

 **Expected Formats:**

--- a/docs/src/pages/docs/local-engines/llama-cpp.mdx
+++ b/docs/src/pages/docs/local-engines/llama-cpp.mdx
@ -57,26 +57,83 @@ Jan offers different backend variants for **llama.cpp** based on your operating
 Choose the backend that matches your hardware. Using the wrong variant may cause performance issues or prevent models from loading.
 </Callout>

-### macOS
- `mac-arm64`: For Apple Silicon Macs (M1/M2/M3)
- `mac-amd64`: For Intel-based Macs
+<Tabs items={['Windows', 'Linux', 'macOS']}>

-### Windows
- `win-cuda`: For NVIDIA GPUs using CUDA
- `win-cpu`: For CPU-only operation
- `win-directml`: For DirectML acceleration (AMD/Intel GPUs)
- `win-opengl`: For OpenGL acceleration
+<Tabs.Tab>
+### CUDA Support (NVIDIA GPUs)
+- `llama.cpp-avx-cuda-11-7`
+- `llama.cpp-avx-cuda-12-0`
+- `llama.cpp-avx2-cuda-11-7`
+- `llama.cpp-avx2-cuda-12-0`
+- `llama.cpp-avx512-cuda-11-7`
+- `llama.cpp-avx512-cuda-12-0`
+- `llama.cpp-noavx-cuda-11-7`
+- `llama.cpp-noavx-cuda-12-0`

-### Linux
- `linux-cuda`: For NVIDIA GPUs using CUDA
- `linux-cpu`: For CPU-only operation
- `linux-rocm`: For AMD GPUs using ROCm
- `linux-openvino`: For Intel GPUs/NPUs using OpenVINO
- `linux-vulkan`: For Vulkan acceleration
+### CPU Only
+- `llama.cpp-avx`
+- `llama.cpp-avx2`
+- `llama.cpp-avx512`
+- `llama.cpp-noavx`
+
+### Other Accelerators
+- `llama.cpp-vulkan`

 <Callout type="info">
-For detailed hardware compatibility, please visit our guide for [Mac](/docs/desktop/mac#compatibility), [Windows](/docs/desktop/windows#compatibility), and [Linux](docs/desktop/linux).
+- For detailed hardware compatibility, please visit our guide for [Windows](/docs/desktop/windows#compatibility).
+- AVX, AVX2, and AVX-512 are CPU instruction sets. For best performance, use the most advanced instruction set your CPU supports.
+- CUDA versions should match your installed NVIDIA drivers.
+</Callout>
+
+</Tabs.Tab>
+
+<Tabs.Tab>
+### CUDA Support (NVIDIA GPUs)
+- `llama.cpp-avx-cuda-11-7`
+- `llama.cpp-avx-cuda-12-0`
+- `llama.cpp-avx2-cuda-11-7`
+- `llama.cpp-avx2-cuda-12-0`
+- `llama.cpp-avx512-cuda-11-7`
+- `llama.cpp-avx512-cuda-12-0`
+- `llama.cpp-noavx-cuda-11-7`
+- `llama.cpp-noavx-cuda-12-0`
+
+### CPU Only
+- `llama.cpp-avx`
+- `llama.cpp-avx2`
+- `llama.cpp-avx512`
+- `llama.cpp-noavx`
+
+### Other Accelerators
+- `llama.cpp-vulkan`
+- `llama.cpp-arm64`
+
+<Callout type="info">
+- For detailed hardware compatibility, please visit our guide for [Linux](docs/desktop/linux).
+- AVX, AVX2, and AVX-512 are CPU instruction sets. For best performance, use the most advanced instruction set your CPU supports.
+- CUDA versions should match your installed NVIDIA drivers.
+</Callout>
+
+</Tabs.Tab>
+
+<Tabs.Tab>
+### Apple Silicon
+- `llama.cpp-mac-arm64`: For M1/M2/M3 Macs
+
+### Intel
+- `llama.cpp-mac-amd64`: For Intel-based Macs
+
+<Callout type="info">
+For detailed hardware compatibility, please visit our guide for [Mac](/docs/desktop/mac#compatibility).
 </Callout>


+</Tabs.Tab>
+
+</Tabs>
+
+
+
+
+

--- a/docs/src/pages/docs/local-engines/tensorrt-llm.mdx
+++ b/docs/src/pages/docs/local-engines/tensorrt-llm.mdx
@ -28,12 +28,9 @@ import { Settings, EllipsisVertical, Plus, FolderOpen, Pencil } from 'lucide-rea
 Jan uses **TensorRT-LLM** as an optional engine for faster inference on NVIDIA GPUs. This engine uses [Cortex-TensorRT-LLM](https://github.com/janhq/cortex.tensorrt-llm), which includes an efficient C++ server that executes the [TRT-LLM C++ runtime](https://nvidia.github.io/TensorRT-LLM/gpt_runtime.html) natively. It also includes features and performance improvements like OpenAI compatibility, tokenizer improvements, and queues.

 <Callout type="info">
-Currently only available for **Windows** users, **Linux** support is coming soon!
+TensorRT-LLM engine is only available for **Windows** users, **Linux** support is coming soon!
 </Callout>

-You can find its settings in **Settings** (<Settings width={16} height={16} style={{display:"inline"}}/>) > **Local Engine** > **TensorRT-LLM**:
-
-

 ## Requirements
 - NVIDIA GPU with Compute Capability 7.0 or higher (RTX 20xx series and above)
@ -45,59 +42,46 @@ You can find its settings in **Settings** (<Settings width={16} height={16} styl
 For detailed setup guide, please visit [Windows](/docs/desktop/windows#compatibility).
 </Callout>

-## Engine Version and Updates
- **Engine Version**: View current version of TensorRT-LLM engine
- **Check Updates**: Verify if a newer version is available & install available updates when it's available
-
-## Available Backends
-
-TensorRT-LLM is specifically designed for NVIDIA GPUs. Available backends include:
-
-**Windows**
- `win-cuda`: For NVIDIA GPUs with CUDA support
-
-<Callout type="warning">
-TensorRT-LLM requires an NVIDIA GPU with CUDA support. It is not compatible with other GPU types or CPU-only systems.
-</Callout>
-
-
-
 ## Enable TensorRT-LLM 

 <Steps>
-### Step 1: Install TensorRT-Extension
+### Step 1: Install Additional Dependencies
+1. Navigate to **Settings** (<Settings width={16} height={16} style={{display:"inline"}}/>) > **Local Engine** > **TensorRT-LLM**:
+2. At **Additional Dependencies**, click **Install**

-1. Click the **Gear Icon (⚙️)** on the bottom left of your screen.
-2. Select the **TensorRT-LLM** under the **Model Provider** section.
 <br/>
-![Click Tensor](../_assets/tensor.png)
+![Click Tensor](../_assets/tensorrt-llm-01.png)
 <br/>
-3. Click **Install** to install the required dependencies to use TensorRT-LLM.
-<br/>
-![Install Extension](../_assets/install-tensor.png)
-<br/>
-3. Check that files are correctly downloaded.

+3. Verify that files are correctly downloaded:
 ```bash
 ls ~/jan/data/extensions/@janhq/tensorrt-llm-extension/dist/bin
-# Your Extension Folder should now include `nitro.exe`, among other artifacts needed to run TRT-LLM
+
+# Your Extension Folder should now include `cortex.exe`, among other artifacts needed to run TRT-LLM
 ```
+4. Restart Jan 

-### Step 2: Download a Compatible Model
+### Step 2: Download Compatible Models

-TensorRT-LLM can only run models in `TensorRT` format. These models, aka "TensorRT Engines", are prebuilt for each target OS+GPU architecture.
+TensorRT-LLM can only run models in `TensorRT` format. These models, also known as "TensorRT Engines", are prebuilt specifically for each operating system and GPU architecture.

-We offer a handful of precompiled models for Ampere and Ada cards that you can immediately download and play with:
+We currently offer a selection of precompiled models optimized for NVIDIA Ampere and Ada GPUs that you can use right away:

-1. Restart the application and go to the Hub.
-2. Look for models with the `TensorRT-LLM` label in the recommended models list > Click **Download**.
+1. Go to **Hub**
+2. Look for models with the `TensorRT-LLM` label & make sure they're within your hardware compatibility
+3. Click **Download** 

-<Callout type='info'>
-  This step might take some time. 🙏
+<Callout type="info">
+This download might take some time as TensorRT models are typically large files.
 </Callout>

-![image](https://hackmd.io/_uploads/rJewrEgRp.png)
+<br/>
+![Download TensorRT-LLM Model](../_assets/tensorrt-llm-02.png)
+<br/>
+
+### Step 3: Start Threads
+Once the model(s) is downloaded, start using it in [Threads](/docs/threads)
+</Steps>
+

-3. Click **Download** to download the model.

-</Steps>