Updated Engines pages
@ -93,7 +93,7 @@
|
||||
<url><loc>https://jan.ai/docs/desktop/windows</loc><lastmod>2024-09-09T08:19:45.722Z</lastmod><changefreq>daily</changefreq><priority>1</priority></url>
|
||||
<url><loc>https://jan.ai/docs/error-codes</loc><lastmod>2024-09-09T08:19:45.722Z</lastmod><changefreq>daily</changefreq><priority>1</priority></url>
|
||||
<url><loc>https://jan.ai/docs/extensions</loc><lastmod>2024-09-09T08:19:45.722Z</lastmod><changefreq>daily</changefreq><priority>1</priority></url>
|
||||
<url><loc>https://jan.ai/docs/installing-extension</loc><lastmod>2024-09-09T08:19:45.722Z</lastmod><changefreq>daily</changefreq><priority>1</priority></url>
|
||||
<url><loc>https://jan.ai/docs/install-extensions</loc><lastmod>2024-09-09T08:19:45.722Z</lastmod><changefreq>daily</changefreq><priority>1</priority></url>
|
||||
<url><loc>https://jan.ai/docs/models</loc><lastmod>2024-09-09T08:19:45.722Z</lastmod><changefreq>daily</changefreq><priority>1</priority></url>
|
||||
<url><loc>https://jan.ai/docs/models/manage-models</loc><lastmod>2024-09-09T08:19:45.722Z</lastmod><changefreq>daily</changefreq><priority>1</priority></url>
|
||||
<url><loc>https://jan.ai/docs/models/model-parameters</loc><lastmod>2024-09-09T08:19:45.722Z</lastmod><changefreq>daily</changefreq><priority>1</priority></url>
|
||||
|
||||
BIN
docs/src/pages/docs/_assets/llama.cpp-01.png
Normal file
|
After Width: | Height: | Size: 159 KiB |
|
Before Width: | Height: | Size: 190 KiB After Width: | Height: | Size: 191 KiB |
|
Before Width: | Height: | Size: 184 KiB After Width: | Height: | Size: 184 KiB |
|
Before Width: | Height: | Size: 159 KiB After Width: | Height: | Size: 160 KiB |
|
Before Width: | Height: | Size: 183 KiB After Width: | Height: | Size: 184 KiB |
|
Before Width: | Height: | Size: 179 KiB After Width: | Height: | Size: 180 KiB |
|
Before Width: | Height: | Size: 184 KiB After Width: | Height: | Size: 184 KiB |
|
Before Width: | Height: | Size: 171 KiB After Width: | Height: | Size: 171 KiB |
|
Before Width: | Height: | Size: 173 KiB After Width: | Height: | Size: 173 KiB |
|
Before Width: | Height: | Size: 173 KiB After Width: | Height: | Size: 173 KiB |
|
Before Width: | Height: | Size: 173 KiB After Width: | Height: | Size: 173 KiB |
|
Before Width: | Height: | Size: 182 KiB After Width: | Height: | Size: 182 KiB |
|
Before Width: | Height: | Size: 182 KiB After Width: | Height: | Size: 182 KiB |
|
Before Width: | Height: | Size: 182 KiB After Width: | Height: | Size: 182 KiB |
|
Before Width: | Height: | Size: 187 KiB After Width: | Height: | Size: 188 KiB |
|
Before Width: | Height: | Size: 187 KiB After Width: | Height: | Size: 188 KiB |
|
Before Width: | Height: | Size: 187 KiB After Width: | Height: | Size: 188 KiB |
|
Before Width: | Height: | Size: 187 KiB After Width: | Height: | Size: 188 KiB |
|
Before Width: | Height: | Size: 188 KiB After Width: | Height: | Size: 188 KiB |
|
Before Width: | Height: | Size: 188 KiB After Width: | Height: | Size: 188 KiB |
|
Before Width: | Height: | Size: 188 KiB After Width: | Height: | Size: 189 KiB |
|
Before Width: | Height: | Size: 190 KiB After Width: | Height: | Size: 190 KiB |
|
Before Width: | Height: | Size: 203 KiB After Width: | Height: | Size: 209 KiB |
@ -27,16 +27,17 @@
|
||||
"title": "ENGINES",
|
||||
"type": "separator"
|
||||
},
|
||||
"built-in": "Local Engines",
|
||||
"local-engines": "Local Engines",
|
||||
"remote-models": "Remote Engines",
|
||||
"install-engines": "Install Engines",
|
||||
"extensions-separator": {
|
||||
"title": "EXTENSIONS",
|
||||
"type": "separator"
|
||||
},
|
||||
"extensions": "Overview",
|
||||
"extensions-settings": "Extensions Settings",
|
||||
"extensions-settings": "Extension Settings",
|
||||
"configure-extensions": "Configure Extensions",
|
||||
"installing-extension": "Install Extensions",
|
||||
"install-extensions": "Install Extensions",
|
||||
"troubleshooting-separator": {
|
||||
"title": "TROUBLESHOOTING",
|
||||
"type": "separator"
|
||||
|
||||
@ -1,133 +0,0 @@
|
||||
---
|
||||
title: llama.cpp
|
||||
description: A step-by-step guide on how to customize the llama.cpp engine.
|
||||
keywords:
|
||||
[
|
||||
Jan,
|
||||
Customizable Intelligence, LLM,
|
||||
local AI,
|
||||
privacy focus,
|
||||
free and open source,
|
||||
private and offline,
|
||||
conversational AI,
|
||||
no-subscription fee,
|
||||
large language models,
|
||||
Llama CPP integration,
|
||||
llama.cpp Engine,
|
||||
Intel CPU,
|
||||
AMD CPU,
|
||||
NVIDIA GPU,
|
||||
AMD GPU Radeon,
|
||||
Apple Silicon,
|
||||
Intel Arc GPU,
|
||||
]
|
||||
---
|
||||
|
||||
import { Tabs } from 'nextra/components'
|
||||
import { Callout, Steps } from 'nextra/components'
|
||||
|
||||
# llama.cpp (Cortex)
|
||||
|
||||
## Overview
|
||||
|
||||
Jan has [**Cortex**](https://github.com/janhq/cortex) - a default C++ inference server built on top of [llama.cpp](https://github.com/ggerganov/llama.cpp). This server provides an OpenAI-compatible API, queues, scaling, and additional features on top of the wide capabilities of `llama.cpp`.
|
||||
|
||||
This guide shows you how to initialize the `llama.cpp` to download and install the required dependencies to start chatting with a model using the `llama.cpp` engine.
|
||||
|
||||
## Prerequisites
|
||||
- Mac Intel:
|
||||
- Make sure you're using an Intel-based Mac. For a complete list of supported Intel CPUs, please see [here](https://en.wikipedia.org/wiki/MacBook_Pro_(Intel-based)).
|
||||
- For Mac Intel, it is recommended to utilize smaller models.
|
||||
- Mac Sillicon:
|
||||
- Make sure you're using a Mac Silicon. For a complete list of supported Apple Silicon CPUs, please see [here](https://en.wikipedia.org/wiki/Apple_Silicon).
|
||||
- Using an adequate model size based on your hardware is recommended for Mac Silicon.
|
||||
<Callout type="info">
|
||||
This can use Apple GPU with Metal by default for acceleration. Apple ANE is not supported yet.
|
||||
</Callout>
|
||||
- Windows:
|
||||
- Ensure that you have **Windows with x86_64** architecture.
|
||||
- Linux:
|
||||
- Ensure that you have **Linux with x86_64** architecture.
|
||||
|
||||
#### GPU Acceleration Options
|
||||
Enable the GPU acceleration option within the Jan application by following the [Installation Setup](/docs/desktop-installation) guide.
|
||||
## Step-by-step Guide
|
||||
<Steps>
|
||||
### Step 1: Open the `model.json`
|
||||
1. Open [Jan Data Folder](/docs/data-folder#open-jan-data-folder)
|
||||
|
||||
<br/>
|
||||

|
||||
<br/>
|
||||
|
||||
2. Select **models** folder > Click **model folder** that you want to modify > click `model.json`
|
||||
3. Once open, `model.json` file looks like below, use model "TinyLlama Chat 1.1B Q4" as an example:
|
||||
```json
|
||||
{
|
||||
"sources": [
|
||||
{
|
||||
"filename": "tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf",
|
||||
"url": "https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf"
|
||||
}
|
||||
],
|
||||
"id": "tinyllama-1.1b",
|
||||
"object": "model",
|
||||
"name": "TinyLlama Chat 1.1B Q4",
|
||||
"version": "1.0",
|
||||
"description": "TinyLlama is a tiny model with only 1.1B. It's a good model for less powerful computers.",
|
||||
"format": "gguf",
|
||||
"settings": {
|
||||
"ctx_len": 4096,
|
||||
"prompt_template": "<|system|>\n{system_message}<|user|>\n{prompt}<|assistant|>",
|
||||
"llama_model_path": "tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf"
|
||||
},
|
||||
"parameters": {
|
||||
"temperature": 0.7,
|
||||
"top_p": 0.95,
|
||||
"stream": true,
|
||||
"max_tokens": 2048,
|
||||
"stop": [],
|
||||
"frequency_penalty": 0,
|
||||
"presence_penalty": 0
|
||||
},
|
||||
"metadata": {
|
||||
"author": "TinyLlama",
|
||||
"tags": [
|
||||
"Tiny",
|
||||
"Foundation Model"
|
||||
],
|
||||
"size": 669000000
|
||||
},
|
||||
"engine": "nitro"
|
||||
}
|
||||
```
|
||||
### Step 2: Modify the `model.json`
|
||||
1. Modify the model's engine settings under the settings array. You can modify the settings with the following parameters:
|
||||
|
||||
| Parameter | Type | Description |
|
||||
| --------------- | ----------- | ---------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `ctx_len` | **Integer** | Provides ample context for model operations like `GPT-3.5`. The default value is `2048` (_Maximum_: `4096`, _Minimum_: `1`). |
|
||||
| `prompt_template` | **String** | Defines the template used to format prompts |
|
||||
| `model_path` | **String** | Specifies the path to the model `.GGUF` file. |
|
||||
| `ngl` | **Integer** | Determines GPU layer usage. The default value is `100`. |
|
||||
| `cpu_threads` | **Integer** | Determines CPU inference threads, limited by hardware and OS. (_Maximum_ determined by system) |
|
||||
| `cont_batching` | **Integer** | Controls continuous batching, enhancing throughput for LLM inference. |
|
||||
| `embedding` | **Integer** | Enables embedding utilization for tasks like document-enhanced chat in RAG-based applications. |
|
||||
2. Save the `model.json` file.
|
||||
<Callout type='info'>
|
||||
If you use a different model, you must set it up again. As this only affects the selected model.
|
||||
</Callout>
|
||||
### Step 3: Start the Model
|
||||
1. Restart the Jan application to apply your settings.
|
||||
2. Navigate to the **Threads**.
|
||||
3. Chat with your model.
|
||||
</Steps>
|
||||
<Callout type='info'>
|
||||
- To utilize the embedding feature, include the JSON parameter `"embedding": true`. It will enable Nitro to process inferences with embedding capabilities. Please refer to the [Embedding in the Nitro documentation](https://nitro.jan.ai/features/embed) for a more detailed explanation.
|
||||
- To utilize the continuous batching feature for boosting throughput and minimizing latency in large language model (LLM) inference, include `cont_batching: true`. For details, please refer to the [Continuous Batching in the Nitro documentation](https://nitro.jan.ai/features/cont-batch).
|
||||
</Callout>
|
||||
|
||||
|
||||
<Callout type='info'>
|
||||
If you have questions, please join our [Discord community](https://discord.gg/Dt7MxDyNNZ) for support, updates, and discussions.
|
||||
</Callout>
|
||||
166
docs/src/pages/docs/install-engines.mdx
Normal file
@ -0,0 +1,166 @@
|
||||
---
|
||||
title: Install Engines
|
||||
description: Learn about Jan's default extensions and explore how to configure them.
|
||||
[
|
||||
Jan,
|
||||
Customizable Intelligence, LLM,
|
||||
local AI,
|
||||
privacy focus,
|
||||
free and open source,
|
||||
private and offline,
|
||||
conversational AI,
|
||||
no-subscription fee,
|
||||
large language models,
|
||||
Jan Extensions,
|
||||
Extensions,
|
||||
]
|
||||
---
|
||||
|
||||
import { Callout } from 'nextra/components'
|
||||
import { Settings, EllipsisVertical } from 'lucide-react'
|
||||
|
||||
# Install Engines
|
||||
|
||||
## Install Local Engines
|
||||
Jan currently doesn't support installing a local engine yet.
|
||||
|
||||
## Install Remote Engines
|
||||
|
||||
### Step-by-step Guide
|
||||
You can add any OpenAI API-compatible providers like OpenAI, Anthropic, or others.
|
||||
To add a new remote engine:
|
||||
|
||||
1. Navigate to **Settings** (<Settings width={16} height={16} style={{display:"inline"}}/>) > **Engines**
|
||||
1. At **Remote Engine** category, click **+ Install Engine**
|
||||
2. Fill in the following required information:
|
||||
|
||||
| Field | Description | Required |
|
||||
|-------|-------------|----------|
|
||||
| Engine Name | Name for your engine (e.g., "OpenAI", "Claude") | ✓ |
|
||||
| API URL | The base URL of the provider's API | ✓ |
|
||||
| API Key | Your authentication key from the provider | ✓ |
|
||||
| Model List URL | URL for fetching available models | |
|
||||
| API Key Template | Custom authorization header format | |
|
||||
| Request Format Conversion | Function to convert Jan's request format to provider's format | |
|
||||
| Response Format Conversion | Function to convert provider's response format to Jan's format | |
|
||||
|
||||
|
||||
> - The conversion functions are only needed for providers that don't follow the OpenAI API format. For OpenAI-compatible APIs, you can leave these empty.
|
||||
> - For OpenAI-compatible APIs like OpenAI, Anthropic, or Groq, you only need to fill in the required fields. Leave optional fields empty.
|
||||
|
||||
4. Click **Install** to complete
|
||||
|
||||
### Examples
|
||||
#### OpenAI-Compatible Setup
|
||||
Here's how to set up OpenAI as a remote engine:
|
||||
|
||||
1. Engine Name: `OpenAI`
|
||||
2. API URL: `https://api.openai.com`
|
||||
3. Model List URL: `https://api.openai.com/v1/models`
|
||||
4. API Key: Your OpenAI API key
|
||||
5. Leave other fields as default
|
||||
|
||||
|
||||
#### Custom APIs Setup
|
||||
If you're integrating an API that doesn't follow OpenAI's format, you'll need to use the conversion functions.
|
||||
Let's say you have a custom API with this format:
|
||||
|
||||
```javascript
|
||||
// Custom API Request Format
|
||||
{
|
||||
"prompt": "What is AI?",
|
||||
"max_length": 100,
|
||||
"temperature": 0.7
|
||||
}
|
||||
|
||||
// Custom API Response Format
|
||||
{
|
||||
"generated_text": "AI is...",
|
||||
"tokens_used": 50,
|
||||
"status": "success"
|
||||
}
|
||||
```
|
||||
|
||||
Here's how to set it up in Jan:
|
||||
```
|
||||
Engine Name: Custom LLM
|
||||
API URL: https://api.customllm.com
|
||||
API Key: your_api_key_here
|
||||
```
|
||||
|
||||
**Conversion Functions:**
|
||||
1. Request Format Conversion:
|
||||
```javascript
|
||||
function convertRequest(janRequest) {
|
||||
return {
|
||||
prompt: janRequest.messages[janRequest.messages.length - 1].content,
|
||||
max_length: janRequest.max_tokens || 100,
|
||||
temperature: janRequest.temperature || 0.7
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
2. Response Format Conversion:
|
||||
```javascript
|
||||
function convertResponse(apiResponse) {
|
||||
return {
|
||||
choices: [{
|
||||
message: {
|
||||
role: "assistant",
|
||||
content: apiResponse.generated_text
|
||||
}
|
||||
}],
|
||||
usage: {
|
||||
total_tokens: apiResponse.tokens_used
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
<Callout type="info">
|
||||
The conversion functions should:
|
||||
- Request: Convert from Jan's OpenAI-style format to your API's format
|
||||
- Response: Convert from your API's format back to OpenAI-style format
|
||||
</Callout>
|
||||
|
||||
**Expected Formats:**
|
||||
|
||||
1. Jan's Request Format
|
||||
```javascript
|
||||
{
|
||||
"messages": [
|
||||
{"role": "user", "content": "What is AI?"}
|
||||
],
|
||||
"max_tokens": 100,
|
||||
"temperature": 0.7
|
||||
}
|
||||
```
|
||||
|
||||
2. Jan's Expected Response Format
|
||||
```javascript
|
||||
{
|
||||
"choices": [{
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"content": "AI is..."
|
||||
}
|
||||
}],
|
||||
"usage": {
|
||||
"total_tokens": 50
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
<Callout type="warning">
|
||||
Make sure to test your conversion functions thoroughly. Incorrect conversions may cause errors or unexpected behavior.
|
||||
</Callout>
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
@ -24,7 +24,7 @@ import { Callout } from 'nextra/components'
|
||||
|
||||
Jan comes with several [pre-installed extensions](/docs/extensions#core-extensions) that provide core functionalities. You can manually add custom third-party extensions at your own risk.
|
||||
|
||||
## Creating Extensions
|
||||
## Create Extensions
|
||||
|
||||
<Callout type="info">
|
||||
Jan currently only accepts `.tgz` file format for extensions.
|
||||
@ -33,7 +33,7 @@ Jan currently only accepts `.tgz` file format for extensions.
|
||||
> **Heads Up:**
|
||||
> - Please use the following structure and setup as a **reference** only.
|
||||
> - You're free to develop extensions using any approach or structure that works for your needs. As long as your extension can be packaged as a `.tgz` file, it can be installed in Jan. Feel free to experiment and innovate!
|
||||
> - If you already have your own `.tgz` extension file, please move forward to [install extension](/docs/installing-extension#install-extensions) step.
|
||||
> - If you already have your own `.tgz` extension file, please move forward to [install extension](/docs/install-extensions#install-extensions) step.
|
||||
|
||||
#### Extension Structure
|
||||
Your extension should follow this basic structure:
|
||||
@ -1,10 +1,10 @@
|
||||
{
|
||||
"llama-cpp": {
|
||||
"title": "llama.cpp",
|
||||
"href": "/docs/built-in/llama-cpp"
|
||||
"href": "/docs/local-engines/llama-cpp"
|
||||
},
|
||||
"tensorrt-llm": {
|
||||
"title": "TensorRT-LLM",
|
||||
"href": "/docs/built-in/tensorrt-llm"
|
||||
"href": "/docs/local-engines/tensorrt-llm"
|
||||
}
|
||||
}
|
||||
82
docs/src/pages/docs/local-engines/llama-cpp.mdx
Normal file
@ -0,0 +1,82 @@
|
||||
---
|
||||
title: llama.cpp
|
||||
description: A step-by-step guide on how to customize the llama.cpp engine.
|
||||
keywords:
|
||||
[
|
||||
Jan,
|
||||
Customizable Intelligence, LLM,
|
||||
local AI,
|
||||
privacy focus,
|
||||
free and open source,
|
||||
private and offline,
|
||||
conversational AI,
|
||||
no-subscription fee,
|
||||
large language models,
|
||||
Llama CPP integration,
|
||||
llama.cpp Engine,
|
||||
Intel CPU,
|
||||
AMD CPU,
|
||||
NVIDIA GPU,
|
||||
AMD GPU Radeon,
|
||||
Apple Silicon,
|
||||
Intel Arc GPU,
|
||||
]
|
||||
---
|
||||
|
||||
import { Tabs } from 'nextra/components'
|
||||
import { Callout, Steps } from 'nextra/components'
|
||||
import { Settings, EllipsisVertical, Plus, FolderOpen, Pencil } from 'lucide-react'
|
||||
|
||||
# llama.cpp (Cortex)
|
||||
|
||||
## Overview
|
||||
Jan uses **llama.cpp** for running local AI models. You can find its settings in **Settings** (<Settings width={16} height={16} style={{display:"inline"}}/>) > **Local Engine** > **llama.cpp**:
|
||||
|
||||
<br/>
|
||||

|
||||
<br/>
|
||||
|
||||
These settings are for advanced users, you would want to check these settings when:
|
||||
- Your AI models are running slowly or not working
|
||||
- You've installed new hardware (like a graphics card)
|
||||
- You want to tinker & test performance with different [backends](/docs/local-engines/llama-cpp#available-backends)
|
||||
|
||||
## Engine Version and Updates
|
||||
- **Engine Version**: View current version of llama.cpp engine
|
||||
- **Check Updates**: Verify if a newer version is available & install available updates when it's available
|
||||
|
||||
|
||||
## Available Backends
|
||||
|
||||
Jan offers different backend variants for **llama.cpp** based on your operating system, you can:
|
||||
- Download different backends as needed
|
||||
- Switch between backends for different hardware configurations
|
||||
- View currently installed backends in the list
|
||||
|
||||
<Callout type="warning">
|
||||
Choose the backend that matches your hardware. Using the wrong variant may cause performance issues or prevent models from loading.
|
||||
</Callout>
|
||||
|
||||
### macOS
|
||||
- `mac-arm64`: For Apple Silicon Macs (M1/M2/M3)
|
||||
- `mac-amd64`: For Intel-based Macs
|
||||
|
||||
### Windows
|
||||
- `win-cuda`: For NVIDIA GPUs using CUDA
|
||||
- `win-cpu`: For CPU-only operation
|
||||
- `win-directml`: For DirectML acceleration (AMD/Intel GPUs)
|
||||
- `win-opengl`: For OpenGL acceleration
|
||||
|
||||
### Linux
|
||||
- `linux-cuda`: For NVIDIA GPUs using CUDA
|
||||
- `linux-cpu`: For CPU-only operation
|
||||
- `linux-rocm`: For AMD GPUs using ROCm
|
||||
- `linux-openvino`: For Intel GPUs/NPUs using OpenVINO
|
||||
- `linux-vulkan`: For Vulkan acceleration
|
||||
|
||||
<Callout type="info">
|
||||
For detailed hardware compatibility, please visit our guide for [Mac](/docs/desktop/mac#compatibility), [Windows](/docs/desktop/windows#compatibility), and [Linux](docs/desktop/linux).
|
||||
</Callout>
|
||||
|
||||
|
||||
|
||||
@ -19,25 +19,51 @@ keywords:
|
||||
]
|
||||
---
|
||||
|
||||
import { Tabs } from 'nextra/components'
|
||||
import { Callout, Steps } from 'nextra/components'
|
||||
import { Settings, EllipsisVertical, Plus, FolderOpen, Pencil } from 'lucide-react'
|
||||
|
||||
# TensorRT-LLM
|
||||
|
||||
## Overview
|
||||
Jan uses **TensorRT-LLM** as an optional engine for faster inference on NVIDIA GPUs. This engine uses [Cortex-TensorRT-LLM](https://github.com/janhq/cortex.tensorrt-llm), which includes an efficient C++ server that executes the [TRT-LLM C++ runtime](https://nvidia.github.io/TensorRT-LLM/gpt_runtime.html) natively. It also includes features and performance improvements like OpenAI compatibility, tokenizer improvements, and queues.
|
||||
|
||||
This guide walks you through installing Jan's official [TensorRT-LLM Engine](https://github.com/janhq/nitro-tensorrt-llm). This engine uses [Cortex-TensorRT-LLM](https://github.com/janhq/cortex.tensorrt-llm) as the AI engine instead of the default [Cortex-Llama-CPP](https://github.com/janhq/cortex). It includes an efficient C++ server that executes the [TRT-LLM C++ runtime](https://nvidia.github.io/TensorRT-LLM/gpt_runtime.html) natively. It also includes features and performance improvements like OpenAI compatibility, tokenizer improvements, and queues.
|
||||
|
||||
<Callout type='warning' emoji="">
|
||||
This feature is only available for Windows users. Linux is coming soon.
|
||||
<Callout type="info">
|
||||
Currently only available for **Windows** users, **Linux** support is coming soon!
|
||||
</Callout>
|
||||
|
||||
### Pre-requisites
|
||||
You can find its settings in **Settings** (<Settings width={16} height={16} style={{display:"inline"}}/>) > **Local Engine** > **TensorRT-LLM**:
|
||||
|
||||
- A **Windows** PC.
|
||||
- **Nvidia GPU(s)**: Ada or Ampere series (i.e. RTX 4000s & 3000s). More will be supported soon.
|
||||
- Sufficient disk space for the TensorRT-LLM models and data files (space requirements vary depending on the model size).
|
||||
|
||||
|
||||
## Requirements
|
||||
- NVIDIA GPU with Compute Capability 7.0 or higher (RTX 20xx series and above)
|
||||
- Minimum 8GB VRAM (16GB+ recommended for larger models)
|
||||
- Updated NVIDIA drivers
|
||||
- CUDA Toolkit 11.8 or newer
|
||||
|
||||
<Callout type="info">
|
||||
For detailed setup guide, please visit [Windows](/docs/desktop/windows#compatibility).
|
||||
</Callout>
|
||||
|
||||
## Engine Version and Updates
|
||||
- **Engine Version**: View current version of TensorRT-LLM engine
|
||||
- **Check Updates**: Verify if a newer version is available & install available updates when it's available
|
||||
|
||||
## Available Backends
|
||||
|
||||
TensorRT-LLM is specifically designed for NVIDIA GPUs. Available backends include:
|
||||
|
||||
**Windows**
|
||||
- `win-cuda`: For NVIDIA GPUs with CUDA support
|
||||
|
||||
<Callout type="warning">
|
||||
TensorRT-LLM requires an NVIDIA GPU with CUDA support. It is not compatible with other GPU types or CPU-only systems.
|
||||
</Callout>
|
||||
|
||||
|
||||
|
||||
## Enable TensorRT-LLM
|
||||
|
||||
<Steps>
|
||||
### Step 1: Install TensorRT-Extension
|
||||
|
||||
@ -74,15 +100,4 @@ We offer a handful of precompiled models for Ampere and Ada cards that you can i
|
||||
|
||||
3. Click **Download** to download the model.
|
||||
|
||||
### Step 3: Configure Settings
|
||||
|
||||
1. Navigate to the Thread section.
|
||||
2. Select the model that you have downloaded.
|
||||
3. Customize the default parameters of the model for how Jan runs TensorRT-LLM.
|
||||
<Callout type='info'>
|
||||
Please see [here](/docs/models/model-parameters) for more detailed model parameters.
|
||||
</Callout>
|
||||
<br/>
|
||||

|
||||
|
||||
</Steps>
|
||||
@ -108,10 +108,13 @@ You need to own your **model configurations**, use at your own risk. Misconfigur
|
||||
|
||||
#### 4. Manual Setup
|
||||
For advanced users who add a specific model that is not available within Jan **Hub**:
|
||||
1. Navigate to `~/jan/data/models/`
|
||||
2. Create a new **Folder** for your model
|
||||
3. Add a `model.json` file with your configuration:
|
||||
```
|
||||
<Steps>
|
||||
##### Step 1: Create Model File
|
||||
1. Navigate to [Jan Data Folder]()
|
||||
2. Open `models` folder
|
||||
3. Create a new **Folder** for your model
|
||||
4. Add a `model.json` file with your configuration:
|
||||
```bash
|
||||
"id": "<unique_identifier_of_the_model>",
|
||||
"object": "<type_of_object, e.g., model, tool>",
|
||||
"name": "<name_of_the_model>",
|
||||
@ -130,9 +133,50 @@ For advanced users who add a specific model that is not available within Jan **H
|
||||
"engine": "<engine_or_platform_the_model_runs_on>",
|
||||
"source": "<url_or_source_of_the_model_information>"
|
||||
```
|
||||
Key fields to configure:
|
||||
Here's model "TinyLlama Chat 1.1B Q4" as an example:
|
||||
```json
|
||||
{
|
||||
"sources": [
|
||||
{
|
||||
"filename": "tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf",
|
||||
"url": "https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf"
|
||||
}
|
||||
],
|
||||
"id": "tinyllama-1.1b",
|
||||
"object": "model",
|
||||
"name": "TinyLlama Chat 1.1B Q4",
|
||||
"version": "1.0",
|
||||
"description": "TinyLlama is a tiny model with only 1.1B. It's a good model for less powerful computers.",
|
||||
"format": "gguf",
|
||||
"settings": {
|
||||
"ctx_len": 4096,
|
||||
"prompt_template": "<|system|>\n{system_message}<|user|>\n{prompt}<|assistant|>",
|
||||
"llama_model_path": "tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf"
|
||||
},
|
||||
"parameters": {
|
||||
"temperature": 0.7,
|
||||
"top_p": 0.95,
|
||||
"stream": true,
|
||||
"max_tokens": 2048,
|
||||
"stop": [],
|
||||
"frequency_penalty": 0,
|
||||
"presence_penalty": 0
|
||||
},
|
||||
"metadata": {
|
||||
"author": "TinyLlama",
|
||||
"tags": [
|
||||
"Tiny",
|
||||
"Foundation Model"
|
||||
],
|
||||
"size": 669000000
|
||||
},
|
||||
"engine": "nitro"
|
||||
}
|
||||
```
|
||||
##### Step 2: Modify Model Parameters
|
||||
Modify model parameters under the settings array. Key fields to configure:
|
||||
1. **Settings** is where you can set your engine configurations.
|
||||
2. [**Parameters**](/docs/models#model-parameters) are the adjustable settings that affect how your model operates or processes the data. The fields in parameters are typically general and can be the same across models. Here is an example of model parameters:
|
||||
2. [**Parameters**](/docs/models/model-parameters) are the adjustable settings that affect how your model operates or processes the data. The fields in parameters are typically general and can be the same across models. Here is an example of model parameters:
|
||||
```
|
||||
"parameters":{
|
||||
"temperature": 0.7,
|
||||
@ -142,7 +186,7 @@ Key fields to configure:
|
||||
"frequency_penalty": 0,
|
||||
"presence_penalty": 0
|
||||
```
|
||||
|
||||
</Steps>
|
||||
|
||||
### Delete Models
|
||||
1. Go to **Settings** (<Settings width={16} height={16} style={{display:"inline"}}/>) > **My Models**
|
||||
|
||||