Updated Engines pages

This commit is contained in:
Ashley 2025-01-08 14:24:58 +07:00
parent a197dc290d
commit 500529b410
9 changed files with 117 additions and 62 deletions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 195 KiB

After

Width:  |  Height:  |  Size: 192 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 188 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 176 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 198 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 154 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 156 KiB

View File

@ -32,8 +32,17 @@ To add a new remote engine:
1. Navigate to **Settings** (<Settings width={16} height={16} style={{display:"inline"}}/>) > **Engines**
1. At **Remote Engine** category, click **+ Install Engine**
<br/>
![Install Remote Engines](./_assets/install-engines-01.png)
<br/>
2. Fill in the following required information:
<br/>
![Install Remote Engines](./_assets/install-engines-02.png)
<br/>
| Field | Description | Required |
|-------|-------------|----------|
| Engine Name | Name for your engine (e.g., "OpenAI", "Claude") | ✓ |
@ -48,7 +57,14 @@ To add a new remote engine:
> - The conversion functions are only needed for providers that don't follow the OpenAI API format. For OpenAI-compatible APIs, you can leave these empty.
> - For OpenAI-compatible APIs like OpenAI, Anthropic, or Groq, you only need to fill in the required fields. Leave optional fields empty.
4. Click **Install** to complete
4. Click **Install**
5. Once completed, you should see your engine in **Engines** page:
- You can rename or uninstall your engine
- You can navigate to its own settings page
<br/>
![Install Remote Engines](./_assets/install-engines-03.png)
<br/>
### Examples
#### OpenAI-Compatible Setup
@ -89,6 +105,9 @@ API Key: your_api_key_here
```
**Conversion Functions:**
> - Request: Convert from Jan's OpenAI-style format to your API's format
> - Response: Convert from your API's format back to OpenAI-style format
1. Request Format Conversion:
```javascript
function convertRequest(janRequest) {
@ -117,11 +136,6 @@ function convertResponse(apiResponse) {
}
```
<Callout type="info">
The conversion functions should:
- Request: Convert from Jan's OpenAI-style format to your API's format
- Response: Convert from your API's format back to OpenAI-style format
</Callout>
**Expected Formats:**

View File

@ -57,26 +57,83 @@ Jan offers different backend variants for **llama.cpp** based on your operating
Choose the backend that matches your hardware. Using the wrong variant may cause performance issues or prevent models from loading.
</Callout>
### macOS
- `mac-arm64`: For Apple Silicon Macs (M1/M2/M3)
- `mac-amd64`: For Intel-based Macs
<Tabs items={['Windows', 'Linux', 'macOS']}>
### Windows
- `win-cuda`: For NVIDIA GPUs using CUDA
- `win-cpu`: For CPU-only operation
- `win-directml`: For DirectML acceleration (AMD/Intel GPUs)
- `win-opengl`: For OpenGL acceleration
<Tabs.Tab>
### CUDA Support (NVIDIA GPUs)
- `llama.cpp-avx-cuda-11-7`
- `llama.cpp-avx-cuda-12-0`
- `llama.cpp-avx2-cuda-11-7`
- `llama.cpp-avx2-cuda-12-0`
- `llama.cpp-avx512-cuda-11-7`
- `llama.cpp-avx512-cuda-12-0`
- `llama.cpp-noavx-cuda-11-7`
- `llama.cpp-noavx-cuda-12-0`
### Linux
- `linux-cuda`: For NVIDIA GPUs using CUDA
- `linux-cpu`: For CPU-only operation
- `linux-rocm`: For AMD GPUs using ROCm
- `linux-openvino`: For Intel GPUs/NPUs using OpenVINO
- `linux-vulkan`: For Vulkan acceleration
### CPU Only
- `llama.cpp-avx`
- `llama.cpp-avx2`
- `llama.cpp-avx512`
- `llama.cpp-noavx`
### Other Accelerators
- `llama.cpp-vulkan`
<Callout type="info">
For detailed hardware compatibility, please visit our guide for [Mac](/docs/desktop/mac#compatibility), [Windows](/docs/desktop/windows#compatibility), and [Linux](docs/desktop/linux).
- For detailed hardware compatibility, please visit our guide for [Windows](/docs/desktop/windows#compatibility).
- AVX, AVX2, and AVX-512 are CPU instruction sets. For best performance, use the most advanced instruction set your CPU supports.
- CUDA versions should match your installed NVIDIA drivers.
</Callout>
</Tabs.Tab>
<Tabs.Tab>
### CUDA Support (NVIDIA GPUs)
- `llama.cpp-avx-cuda-11-7`
- `llama.cpp-avx-cuda-12-0`
- `llama.cpp-avx2-cuda-11-7`
- `llama.cpp-avx2-cuda-12-0`
- `llama.cpp-avx512-cuda-11-7`
- `llama.cpp-avx512-cuda-12-0`
- `llama.cpp-noavx-cuda-11-7`
- `llama.cpp-noavx-cuda-12-0`
### CPU Only
- `llama.cpp-avx`
- `llama.cpp-avx2`
- `llama.cpp-avx512`
- `llama.cpp-noavx`
### Other Accelerators
- `llama.cpp-vulkan`
- `llama.cpp-arm64`
<Callout type="info">
- For detailed hardware compatibility, please visit our guide for [Linux](docs/desktop/linux).
- AVX, AVX2, and AVX-512 are CPU instruction sets. For best performance, use the most advanced instruction set your CPU supports.
- CUDA versions should match your installed NVIDIA drivers.
</Callout>
</Tabs.Tab>
<Tabs.Tab>
### Apple Silicon
- `llama.cpp-mac-arm64`: For M1/M2/M3 Macs
### Intel
- `llama.cpp-mac-amd64`: For Intel-based Macs
<Callout type="info">
For detailed hardware compatibility, please visit our guide for [Mac](/docs/desktop/mac#compatibility).
</Callout>
</Tabs.Tab>
</Tabs>

View File

@ -28,12 +28,9 @@ import { Settings, EllipsisVertical, Plus, FolderOpen, Pencil } from 'lucide-rea
Jan uses **TensorRT-LLM** as an optional engine for faster inference on NVIDIA GPUs. This engine uses [Cortex-TensorRT-LLM](https://github.com/janhq/cortex.tensorrt-llm), which includes an efficient C++ server that executes the [TRT-LLM C++ runtime](https://nvidia.github.io/TensorRT-LLM/gpt_runtime.html) natively. It also includes features and performance improvements like OpenAI compatibility, tokenizer improvements, and queues.
<Callout type="info">
Currently only available for **Windows** users, **Linux** support is coming soon!
TensorRT-LLM engine is only available for **Windows** users, **Linux** support is coming soon!
</Callout>
You can find its settings in **Settings** (<Settings width={16} height={16} style={{display:"inline"}}/>) > **Local Engine** > **TensorRT-LLM**:
## Requirements
- NVIDIA GPU with Compute Capability 7.0 or higher (RTX 20xx series and above)
@ -45,59 +42,46 @@ You can find its settings in **Settings** (<Settings width={16} height={16} styl
For detailed setup guide, please visit [Windows](/docs/desktop/windows#compatibility).
</Callout>
## Engine Version and Updates
- **Engine Version**: View current version of TensorRT-LLM engine
- **Check Updates**: Verify if a newer version is available & install available updates when it's available
## Available Backends
TensorRT-LLM is specifically designed for NVIDIA GPUs. Available backends include:
**Windows**
- `win-cuda`: For NVIDIA GPUs with CUDA support
<Callout type="warning">
TensorRT-LLM requires an NVIDIA GPU with CUDA support. It is not compatible with other GPU types or CPU-only systems.
</Callout>
## Enable TensorRT-LLM
<Steps>
### Step 1: Install TensorRT-Extension
### Step 1: Install Additional Dependencies
1. Navigate to **Settings** (<Settings width={16} height={16} style={{display:"inline"}}/>) > **Local Engine** > **TensorRT-LLM**:
2. At **Additional Dependencies**, click **Install**
1. Click the **Gear Icon (⚙️)** on the bottom left of your screen.
2. Select the **TensorRT-LLM** under the **Model Provider** section.
<br/>
![Click Tensor](../_assets/tensor.png)
![Click Tensor](../_assets/tensorrt-llm-01.png)
<br/>
3. Click **Install** to install the required dependencies to use TensorRT-LLM.
<br/>
![Install Extension](../_assets/install-tensor.png)
<br/>
3. Check that files are correctly downloaded.
3. Verify that files are correctly downloaded:
```bash
ls ~/jan/data/extensions/@janhq/tensorrt-llm-extension/dist/bin
# Your Extension Folder should now include `nitro.exe`, among other artifacts needed to run TRT-LLM
# Your Extension Folder should now include `cortex.exe`, among other artifacts needed to run TRT-LLM
```
4. Restart Jan
### Step 2: Download a Compatible Model
### Step 2: Download Compatible Models
TensorRT-LLM can only run models in `TensorRT` format. These models, aka "TensorRT Engines", are prebuilt for each target OS+GPU architecture.
TensorRT-LLM can only run models in `TensorRT` format. These models, also known as "TensorRT Engines", are prebuilt specifically for each operating system and GPU architecture.
We offer a handful of precompiled models for Ampere and Ada cards that you can immediately download and play with:
We currently offer a selection of precompiled models optimized for NVIDIA Ampere and Ada GPUs that you can use right away:
1. Restart the application and go to the Hub.
2. Look for models with the `TensorRT-LLM` label in the recommended models list > Click **Download**.
1. Go to **Hub**
2. Look for models with the `TensorRT-LLM` label & make sure they're within your hardware compatibility
3. Click **Download**
<Callout type='info'>
This step might take some time. 🙏
<Callout type="info">
This download might take some time as TensorRT models are typically large files.
</Callout>
![image](https://hackmd.io/_uploads/rJewrEgRp.png)
<br/>
![Download TensorRT-LLM Model](../_assets/tensorrt-llm-02.png)
<br/>
### Step 3: Start Threads
Once the model(s) is downloaded, start using it in [Threads](/docs/threads)
</Steps>
3. Click **Download** to download the model.
</Steps>