Updated Engines pages
|
Before Width: | Height: | Size: 195 KiB After Width: | Height: | Size: 192 KiB |
BIN
docs/src/pages/docs/_assets/install-engines-01.png
Normal file
|
After Width: | Height: | Size: 188 KiB |
BIN
docs/src/pages/docs/_assets/install-engines-02.png
Normal file
|
After Width: | Height: | Size: 176 KiB |
BIN
docs/src/pages/docs/_assets/install-engines-03.png
Normal file
|
After Width: | Height: | Size: 198 KiB |
BIN
docs/src/pages/docs/_assets/tensorrt-llm-01.png
Normal file
|
After Width: | Height: | Size: 154 KiB |
BIN
docs/src/pages/docs/_assets/tensorrt-llm-02.png
Normal file
|
After Width: | Height: | Size: 156 KiB |
@ -32,8 +32,17 @@ To add a new remote engine:
|
||||
|
||||
1. Navigate to **Settings** (<Settings width={16} height={16} style={{display:"inline"}}/>) > **Engines**
|
||||
1. At **Remote Engine** category, click **+ Install Engine**
|
||||
|
||||
<br/>
|
||||

|
||||
<br/>
|
||||
|
||||
2. Fill in the following required information:
|
||||
|
||||
<br/>
|
||||

|
||||
<br/>
|
||||
|
||||
| Field | Description | Required |
|
||||
|-------|-------------|----------|
|
||||
| Engine Name | Name for your engine (e.g., "OpenAI", "Claude") | ✓ |
|
||||
@ -48,7 +57,14 @@ To add a new remote engine:
|
||||
> - The conversion functions are only needed for providers that don't follow the OpenAI API format. For OpenAI-compatible APIs, you can leave these empty.
|
||||
> - For OpenAI-compatible APIs like OpenAI, Anthropic, or Groq, you only need to fill in the required fields. Leave optional fields empty.
|
||||
|
||||
4. Click **Install** to complete
|
||||
4. Click **Install**
|
||||
5. Once completed, you should see your engine in **Engines** page:
|
||||
- You can rename or uninstall your engine
|
||||
- You can navigate to its own settings page
|
||||
|
||||
<br/>
|
||||

|
||||
<br/>
|
||||
|
||||
### Examples
|
||||
#### OpenAI-Compatible Setup
|
||||
@ -89,6 +105,9 @@ API Key: your_api_key_here
|
||||
```
|
||||
|
||||
**Conversion Functions:**
|
||||
> - Request: Convert from Jan's OpenAI-style format to your API's format
|
||||
> - Response: Convert from your API's format back to OpenAI-style format
|
||||
|
||||
1. Request Format Conversion:
|
||||
```javascript
|
||||
function convertRequest(janRequest) {
|
||||
@ -117,11 +136,6 @@ function convertResponse(apiResponse) {
|
||||
}
|
||||
```
|
||||
|
||||
<Callout type="info">
|
||||
The conversion functions should:
|
||||
- Request: Convert from Jan's OpenAI-style format to your API's format
|
||||
- Response: Convert from your API's format back to OpenAI-style format
|
||||
</Callout>
|
||||
|
||||
**Expected Formats:**
|
||||
|
||||
|
||||
@ -57,26 +57,83 @@ Jan offers different backend variants for **llama.cpp** based on your operating
|
||||
Choose the backend that matches your hardware. Using the wrong variant may cause performance issues or prevent models from loading.
|
||||
</Callout>
|
||||
|
||||
### macOS
|
||||
- `mac-arm64`: For Apple Silicon Macs (M1/M2/M3)
|
||||
- `mac-amd64`: For Intel-based Macs
|
||||
<Tabs items={['Windows', 'Linux', 'macOS']}>
|
||||
|
||||
### Windows
|
||||
- `win-cuda`: For NVIDIA GPUs using CUDA
|
||||
- `win-cpu`: For CPU-only operation
|
||||
- `win-directml`: For DirectML acceleration (AMD/Intel GPUs)
|
||||
- `win-opengl`: For OpenGL acceleration
|
||||
<Tabs.Tab>
|
||||
### CUDA Support (NVIDIA GPUs)
|
||||
- `llama.cpp-avx-cuda-11-7`
|
||||
- `llama.cpp-avx-cuda-12-0`
|
||||
- `llama.cpp-avx2-cuda-11-7`
|
||||
- `llama.cpp-avx2-cuda-12-0`
|
||||
- `llama.cpp-avx512-cuda-11-7`
|
||||
- `llama.cpp-avx512-cuda-12-0`
|
||||
- `llama.cpp-noavx-cuda-11-7`
|
||||
- `llama.cpp-noavx-cuda-12-0`
|
||||
|
||||
### Linux
|
||||
- `linux-cuda`: For NVIDIA GPUs using CUDA
|
||||
- `linux-cpu`: For CPU-only operation
|
||||
- `linux-rocm`: For AMD GPUs using ROCm
|
||||
- `linux-openvino`: For Intel GPUs/NPUs using OpenVINO
|
||||
- `linux-vulkan`: For Vulkan acceleration
|
||||
### CPU Only
|
||||
- `llama.cpp-avx`
|
||||
- `llama.cpp-avx2`
|
||||
- `llama.cpp-avx512`
|
||||
- `llama.cpp-noavx`
|
||||
|
||||
### Other Accelerators
|
||||
- `llama.cpp-vulkan`
|
||||
|
||||
<Callout type="info">
|
||||
For detailed hardware compatibility, please visit our guide for [Mac](/docs/desktop/mac#compatibility), [Windows](/docs/desktop/windows#compatibility), and [Linux](docs/desktop/linux).
|
||||
- For detailed hardware compatibility, please visit our guide for [Windows](/docs/desktop/windows#compatibility).
|
||||
- AVX, AVX2, and AVX-512 are CPU instruction sets. For best performance, use the most advanced instruction set your CPU supports.
|
||||
- CUDA versions should match your installed NVIDIA drivers.
|
||||
</Callout>
|
||||
|
||||
</Tabs.Tab>
|
||||
|
||||
<Tabs.Tab>
|
||||
### CUDA Support (NVIDIA GPUs)
|
||||
- `llama.cpp-avx-cuda-11-7`
|
||||
- `llama.cpp-avx-cuda-12-0`
|
||||
- `llama.cpp-avx2-cuda-11-7`
|
||||
- `llama.cpp-avx2-cuda-12-0`
|
||||
- `llama.cpp-avx512-cuda-11-7`
|
||||
- `llama.cpp-avx512-cuda-12-0`
|
||||
- `llama.cpp-noavx-cuda-11-7`
|
||||
- `llama.cpp-noavx-cuda-12-0`
|
||||
|
||||
### CPU Only
|
||||
- `llama.cpp-avx`
|
||||
- `llama.cpp-avx2`
|
||||
- `llama.cpp-avx512`
|
||||
- `llama.cpp-noavx`
|
||||
|
||||
### Other Accelerators
|
||||
- `llama.cpp-vulkan`
|
||||
- `llama.cpp-arm64`
|
||||
|
||||
<Callout type="info">
|
||||
- For detailed hardware compatibility, please visit our guide for [Linux](docs/desktop/linux).
|
||||
- AVX, AVX2, and AVX-512 are CPU instruction sets. For best performance, use the most advanced instruction set your CPU supports.
|
||||
- CUDA versions should match your installed NVIDIA drivers.
|
||||
</Callout>
|
||||
|
||||
</Tabs.Tab>
|
||||
|
||||
<Tabs.Tab>
|
||||
### Apple Silicon
|
||||
- `llama.cpp-mac-arm64`: For M1/M2/M3 Macs
|
||||
|
||||
### Intel
|
||||
- `llama.cpp-mac-amd64`: For Intel-based Macs
|
||||
|
||||
<Callout type="info">
|
||||
For detailed hardware compatibility, please visit our guide for [Mac](/docs/desktop/mac#compatibility).
|
||||
</Callout>
|
||||
|
||||
|
||||
</Tabs.Tab>
|
||||
|
||||
</Tabs>
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
@ -28,12 +28,9 @@ import { Settings, EllipsisVertical, Plus, FolderOpen, Pencil } from 'lucide-rea
|
||||
Jan uses **TensorRT-LLM** as an optional engine for faster inference on NVIDIA GPUs. This engine uses [Cortex-TensorRT-LLM](https://github.com/janhq/cortex.tensorrt-llm), which includes an efficient C++ server that executes the [TRT-LLM C++ runtime](https://nvidia.github.io/TensorRT-LLM/gpt_runtime.html) natively. It also includes features and performance improvements like OpenAI compatibility, tokenizer improvements, and queues.
|
||||
|
||||
<Callout type="info">
|
||||
Currently only available for **Windows** users, **Linux** support is coming soon!
|
||||
TensorRT-LLM engine is only available for **Windows** users, **Linux** support is coming soon!
|
||||
</Callout>
|
||||
|
||||
You can find its settings in **Settings** (<Settings width={16} height={16} style={{display:"inline"}}/>) > **Local Engine** > **TensorRT-LLM**:
|
||||
|
||||
|
||||
|
||||
## Requirements
|
||||
- NVIDIA GPU with Compute Capability 7.0 or higher (RTX 20xx series and above)
|
||||
@ -45,59 +42,46 @@ You can find its settings in **Settings** (<Settings width={16} height={16} styl
|
||||
For detailed setup guide, please visit [Windows](/docs/desktop/windows#compatibility).
|
||||
</Callout>
|
||||
|
||||
## Engine Version and Updates
|
||||
- **Engine Version**: View current version of TensorRT-LLM engine
|
||||
- **Check Updates**: Verify if a newer version is available & install available updates when it's available
|
||||
|
||||
## Available Backends
|
||||
|
||||
TensorRT-LLM is specifically designed for NVIDIA GPUs. Available backends include:
|
||||
|
||||
**Windows**
|
||||
- `win-cuda`: For NVIDIA GPUs with CUDA support
|
||||
|
||||
<Callout type="warning">
|
||||
TensorRT-LLM requires an NVIDIA GPU with CUDA support. It is not compatible with other GPU types or CPU-only systems.
|
||||
</Callout>
|
||||
|
||||
|
||||
|
||||
## Enable TensorRT-LLM
|
||||
|
||||
<Steps>
|
||||
### Step 1: Install TensorRT-Extension
|
||||
### Step 1: Install Additional Dependencies
|
||||
1. Navigate to **Settings** (<Settings width={16} height={16} style={{display:"inline"}}/>) > **Local Engine** > **TensorRT-LLM**:
|
||||
2. At **Additional Dependencies**, click **Install**
|
||||
|
||||
1. Click the **Gear Icon (⚙️)** on the bottom left of your screen.
|
||||
2. Select the **TensorRT-LLM** under the **Model Provider** section.
|
||||
<br/>
|
||||

|
||||

|
||||
<br/>
|
||||
3. Click **Install** to install the required dependencies to use TensorRT-LLM.
|
||||
<br/>
|
||||

|
||||
<br/>
|
||||
3. Check that files are correctly downloaded.
|
||||
|
||||
3. Verify that files are correctly downloaded:
|
||||
```bash
|
||||
ls ~/jan/data/extensions/@janhq/tensorrt-llm-extension/dist/bin
|
||||
# Your Extension Folder should now include `nitro.exe`, among other artifacts needed to run TRT-LLM
|
||||
|
||||
# Your Extension Folder should now include `cortex.exe`, among other artifacts needed to run TRT-LLM
|
||||
```
|
||||
4. Restart Jan
|
||||
|
||||
### Step 2: Download a Compatible Model
|
||||
### Step 2: Download Compatible Models
|
||||
|
||||
TensorRT-LLM can only run models in `TensorRT` format. These models, aka "TensorRT Engines", are prebuilt for each target OS+GPU architecture.
|
||||
TensorRT-LLM can only run models in `TensorRT` format. These models, also known as "TensorRT Engines", are prebuilt specifically for each operating system and GPU architecture.
|
||||
|
||||
We offer a handful of precompiled models for Ampere and Ada cards that you can immediately download and play with:
|
||||
We currently offer a selection of precompiled models optimized for NVIDIA Ampere and Ada GPUs that you can use right away:
|
||||
|
||||
1. Restart the application and go to the Hub.
|
||||
2. Look for models with the `TensorRT-LLM` label in the recommended models list > Click **Download**.
|
||||
1. Go to **Hub**
|
||||
2. Look for models with the `TensorRT-LLM` label & make sure they're within your hardware compatibility
|
||||
3. Click **Download**
|
||||
|
||||
<Callout type='info'>
|
||||
This step might take some time. 🙏
|
||||
<Callout type="info">
|
||||
This download might take some time as TensorRT models are typically large files.
|
||||
</Callout>
|
||||
|
||||

|
||||
<br/>
|
||||

|
||||
<br/>
|
||||
|
||||
### Step 3: Start Threads
|
||||
Once the model(s) is downloaded, start using it in [Threads](/docs/threads)
|
||||
</Steps>
|
||||
|
||||
|
||||
3. Click **Download** to download the model.
|
||||
|
||||
</Steps>
|
||||