docs: Initial commit of TensorRT-LLM Blog post (#2428)
docs: Initial commit of TensorRT-LLM Blog post
This commit is contained in:
commit
dd09e622ac
116
docs/blog/2024-03-19-TensorRT-LLM.md
Normal file
116
docs/blog/2024-03-19-TensorRT-LLM.md
Normal file
@ -0,0 +1,116 @@
|
||||
---
|
||||
title: Jan now supports TensorRT-LLM
|
||||
description: Jan has added for Nvidia's TensorRT-LLM, a hardware-optimized LLM inference engine that runs very fast on Nvidia GPUs
|
||||
tags: [Nvidia, TensorRT-LLM]
|
||||
---
|
||||
|
||||
Jan now supports [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) as an alternative inference engine. TensorRT-LLM is a hardware-optimized LLM inference engine that compiles models to [run extremely fast on Nvidia GPUs](https://blogs.nvidia.com/blog/tensorrt-llm-windows-stable-diffusion-rtx/).
|
||||
|
||||
- [TensorRT-LLM Extension](/guides/providers/tensorrt-llm) is available in [0.4.9 release](https://github.com/janhq/jan/releases/tag/v0.4.9)
|
||||
- Currently available only for Windows
|
||||
|
||||
We've made a few TensorRT-LLM models TensorRT-LLM models available in the Jan Hub for download:
|
||||
|
||||
- TinyLlama-1.1b
|
||||
- Mistral 7b
|
||||
- TinyJensen-1.1b, which is trained on Jensen Huang's 👀
|
||||
|
||||
## What is TensorRT-LLM?
|
||||
|
||||
Please read our [TensorRT-LLM Guide](/guides/providers/tensorrt-llm).
|
||||
|
||||
TensorRT-LLM is mainly used in datacenter-grade GPUs to achieve [10,000 tokens/s](https://nvidia.github.io/TensorRT-LLM/blogs/H100vsA100.html) type speeds.
|
||||
|
||||
## Performance Benchmarks
|
||||
|
||||
|
||||
We were curious to see how this would perform on consumer-grade GPUs, as most of Jan's users use consumer-grade GPUs.
|
||||
|
||||
- We’ve done a comparison of how TensorRT-LLM does vs. llama.cpp, our default inference engine.
|
||||
|
||||
| NVIDIA GPU | Architecture | VRAM Used (GB) | CUDA Cores | Tensor Cores | Memory Bus Width (bit) | Memory Bandwidth (GB/s) |
|
||||
| ---------- | ------------ | -------------- | ---------- | ------------ | ---------------------- | ----------------------- |
|
||||
| RTX 4090 | Ada | 24 | 16,384 | 512 | 384 | ~1000 |
|
||||
| RTX 3090 | Ampere | 24 | 10,496 | 328 | 384 | 935.8 |
|
||||
| RTX 4060 | Ada | 8 | 3,072 | 96 | 128 | 272 |
|
||||
|
||||
> We test using batch_size 1 and input length 2048, output length 512 as it’s the common use case people all use. We run 5 times and get the Average.
|
||||
|
||||
> We use Windows task manager and Linux NVIDIA-SMI/ Htop to get CPU/ Memory/ NVIDIA GPU metrics per process.
|
||||
|
||||
> We turn off all user application and only open Jan app with Nitro tensorrt-llm or NVIDIA benchmark script in python
|
||||
|
||||
### RTX 4090 on Windows PC
|
||||
|
||||
- CPU: Intel 13th series
|
||||
- GPU: NVIDIA GPU 4090 (Ampere - sm 86)
|
||||
- RAM: 120GB
|
||||
- OS: Windows
|
||||
|
||||
#### TinyLlama-1.1b q4
|
||||
|
||||
| Metrics | GGUF (using the GPU) | TensorRT-LLM |
|
||||
| -------------------- | -------------------- | ------------ |
|
||||
| Throughput (token/s) | 104 | ✅ 131 |
|
||||
| VRAM Used (GB) | 2.1 | 😱 21.5 |
|
||||
| RAM Used (GB) | 0.3 | 😱 15 |
|
||||
| Disk Size (GB) | 4.07 | 4.07 |
|
||||
|
||||
#### Mistral-7b int4
|
||||
|
||||
| Metrics | GGUF (using the GPU) | TensorRT-LLM |
|
||||
| -------------------- | -------------------- | ------------ |
|
||||
| Throughput (token/s) | 80 | ✅ 97.9 |
|
||||
| VRAM Used (GB) | 2.1 | 😱 23.5 |
|
||||
| RAM Used (GB) | 0.3 | 😱 15 |
|
||||
| Disk Size (GB) | 4.07 | 4.07 |
|
||||
|
||||
### RTX 3090 on Windows PC
|
||||
|
||||
- CPU: Intel 13th series
|
||||
- GPU: NVIDIA GPU 3090 (Ampere - sm 86)
|
||||
- RAM: 64GB
|
||||
- OS: Windows
|
||||
|
||||
#### TinyLlama-1.1b q4
|
||||
|
||||
| Metrics | GGUF (using the GPU) | TensorRT-LLM |
|
||||
| -------------------- | -------------------- | ------------ |
|
||||
| Throughput (token/s) | 131.28 | ✅ 194 |
|
||||
| VRAM Used (GB) | 2.1 | 😱 21.5 |
|
||||
| RAM Used (GB) | 0.3 | 😱 15 |
|
||||
| Disk Size (GB) | 4.07 | 4.07 |
|
||||
|
||||
#### Mistral-7b int4
|
||||
|
||||
| Metrics | GGUF (using the GPU) | TensorRT-LLM |
|
||||
| -------------------- | -------------------- | ------------ |
|
||||
| Throughput (token/s) | 88 | ✅ 137 |
|
||||
| VRAM Used (GB) | 6.0 | 😱 23.8 |
|
||||
| RAM Used (GB) | 0.3 | 😱 25 |
|
||||
| Disk Size (GB) | 4.07 | 4.07 |
|
||||
|
||||
### RTX 4060 on Windows Laptop
|
||||
|
||||
- Manufacturer: Acer Nitro 16 Phenix
|
||||
- CPU: Ryzen 7000
|
||||
- RAM: 16GB
|
||||
- GPU: NVIDIA Laptop GPU 4060 (Ada)
|
||||
|
||||
#### TinyLlama-1.1b q4
|
||||
|
||||
| Metrics | GGUF (using the GPU) | TensorRT-LLM |
|
||||
| -------------------- | -------------------- | ------------ |
|
||||
| Throughput (token/s) | 65 | ❌ 41 |
|
||||
| VRAM Used (GB) | 2.1 | 😱 7.6 |
|
||||
| RAM Used (GB) | 0.3 | 😱 7.2 |
|
||||
| Disk Size (GB) | 4.07 | 4.07 GB |
|
||||
|
||||
#### Mistral-7b int4
|
||||
|
||||
| Metrics | GGUF (using the GPU) | TensorRT-LLM |
|
||||
| -------------------- | -------------------- | ------------ |
|
||||
| Throughput (token/s) | 22 | ❌ 19 |
|
||||
| VRAM Used (GB) | 2.1 | 😱 7.7 |
|
||||
| RAM Used (GB) | 0.3 | 😱 13.5 |
|
||||
| Disk Size (GB) | 4.07 | 4.07 |
|
||||
@ -64,7 +64,7 @@
|
||||
"content_type": "application/octet-stream",
|
||||
"state": "uploaded",
|
||||
"size": 118201794,
|
||||
"download_count": 89,
|
||||
"download_count": 94,
|
||||
"created_at": "2024-03-19T04:08:03Z",
|
||||
"updated_at": "2024-03-19T04:08:06Z",
|
||||
"browser_download_url": "https://github.com/janhq/jan/releases/download/v0.4.9/jan-linux-amd64-0.4.9.deb"
|
||||
@ -98,7 +98,7 @@
|
||||
"content_type": "application/octet-stream",
|
||||
"state": "uploaded",
|
||||
"size": 156697166,
|
||||
"download_count": 91,
|
||||
"download_count": 98,
|
||||
"created_at": "2024-03-19T04:06:51Z",
|
||||
"updated_at": "2024-03-19T04:06:55Z",
|
||||
"browser_download_url": "https://github.com/janhq/jan/releases/download/v0.4.9/jan-linux-x86_64-0.4.9.AppImage"
|
||||
@ -132,7 +132,7 @@
|
||||
"content_type": "application/octet-stream",
|
||||
"state": "uploaded",
|
||||
"size": 132665337,
|
||||
"download_count": 133,
|
||||
"download_count": 135,
|
||||
"created_at": "2024-03-19T04:10:15Z",
|
||||
"updated_at": "2024-03-19T04:10:26Z",
|
||||
"browser_download_url": "https://github.com/janhq/jan/releases/download/v0.4.9/jan-mac-arm64-0.4.9.dmg"
|
||||
@ -200,7 +200,7 @@
|
||||
"content_type": "application/zip",
|
||||
"state": "uploaded",
|
||||
"size": 128089843,
|
||||
"download_count": 212,
|
||||
"download_count": 222,
|
||||
"created_at": "2024-03-19T04:10:32Z",
|
||||
"updated_at": "2024-03-19T04:10:46Z",
|
||||
"browser_download_url": "https://github.com/janhq/jan/releases/download/v0.4.9/jan-mac-arm64-0.4.9.zip"
|
||||
@ -268,7 +268,7 @@
|
||||
"content_type": "application/octet-stream",
|
||||
"state": "uploaded",
|
||||
"size": 139245048,
|
||||
"download_count": 36,
|
||||
"download_count": 39,
|
||||
"created_at": "2024-03-19T04:17:33Z",
|
||||
"updated_at": "2024-03-19T04:17:37Z",
|
||||
"browser_download_url": "https://github.com/janhq/jan/releases/download/v0.4.9/jan-mac-x64-0.4.9.dmg"
|
||||
@ -336,7 +336,7 @@
|
||||
"content_type": "application/zip",
|
||||
"state": "uploaded",
|
||||
"size": 134752189,
|
||||
"download_count": 38,
|
||||
"download_count": 40,
|
||||
"created_at": "2024-03-19T04:17:52Z",
|
||||
"updated_at": "2024-03-19T04:17:56Z",
|
||||
"browser_download_url": "https://github.com/janhq/jan/releases/download/v0.4.9/jan-mac-x64-0.4.9.zip"
|
||||
@ -404,7 +404,7 @@
|
||||
"content_type": "application/octet-stream",
|
||||
"state": "uploaded",
|
||||
"size": 129440528,
|
||||
"download_count": 1044,
|
||||
"download_count": 1078,
|
||||
"created_at": "2024-03-19T04:18:43Z",
|
||||
"updated_at": "2024-03-19T04:18:46Z",
|
||||
"browser_download_url": "https://github.com/janhq/jan/releases/download/v0.4.9/jan-win-x64-0.4.9.exe"
|
||||
@ -438,7 +438,7 @@
|
||||
"content_type": "application/octet-stream",
|
||||
"state": "uploaded",
|
||||
"size": 136498,
|
||||
"download_count": 556,
|
||||
"download_count": 570,
|
||||
"created_at": "2024-03-19T04:18:43Z",
|
||||
"updated_at": "2024-03-19T04:18:43Z",
|
||||
"browser_download_url": "https://github.com/janhq/jan/releases/download/v0.4.9/jan-win-x64-0.4.9.exe.blockmap"
|
||||
@ -472,7 +472,7 @@
|
||||
"content_type": "text/yaml",
|
||||
"state": "uploaded",
|
||||
"size": 540,
|
||||
"download_count": 304,
|
||||
"download_count": 321,
|
||||
"created_at": "2024-03-19T04:08:06Z",
|
||||
"updated_at": "2024-03-19T04:08:06Z",
|
||||
"browser_download_url": "https://github.com/janhq/jan/releases/download/v0.4.9/latest-linux.yml"
|
||||
@ -506,7 +506,7 @@
|
||||
"content_type": "text/yaml",
|
||||
"state": "uploaded",
|
||||
"size": 842,
|
||||
"download_count": 712,
|
||||
"download_count": 743,
|
||||
"created_at": "2024-03-19T04:18:53Z",
|
||||
"updated_at": "2024-03-19T04:18:53Z",
|
||||
"browser_download_url": "https://github.com/janhq/jan/releases/download/v0.4.9/latest-mac.yml"
|
||||
@ -540,7 +540,7 @@
|
||||
"content_type": "text/yaml",
|
||||
"state": "uploaded",
|
||||
"size": 339,
|
||||
"download_count": 1835,
|
||||
"download_count": 1890,
|
||||
"created_at": "2024-03-19T04:18:46Z",
|
||||
"updated_at": "2024-03-19T04:18:46Z",
|
||||
"browser_download_url": "https://github.com/janhq/jan/releases/download/v0.4.9/latest.yml"
|
||||
@ -627,7 +627,7 @@
|
||||
"content_type": "application/octet-stream",
|
||||
"state": "uploaded",
|
||||
"size": 110060688,
|
||||
"download_count": 850,
|
||||
"download_count": 852,
|
||||
"created_at": "2024-03-11T06:08:19Z",
|
||||
"updated_at": "2024-03-11T06:08:21Z",
|
||||
"browser_download_url": "https://github.com/janhq/jan/releases/download/v0.4.8/jan-linux-amd64-0.4.8.deb"
|
||||
@ -1001,7 +1001,7 @@
|
||||
"content_type": "application/octet-stream",
|
||||
"state": "uploaded",
|
||||
"size": 127370,
|
||||
"download_count": 3188,
|
||||
"download_count": 3195,
|
||||
"created_at": "2024-03-11T06:15:48Z",
|
||||
"updated_at": "2024-03-11T06:15:48Z",
|
||||
"browser_download_url": "https://github.com/janhq/jan/releases/download/v0.4.8/jan-win-x64-0.4.8.exe.blockmap"
|
||||
@ -1564,7 +1564,7 @@
|
||||
"content_type": "application/octet-stream",
|
||||
"state": "uploaded",
|
||||
"size": 116340,
|
||||
"download_count": 6503,
|
||||
"download_count": 6509,
|
||||
"created_at": "2024-02-26T02:48:10Z",
|
||||
"updated_at": "2024-02-26T02:48:10Z",
|
||||
"browser_download_url": "https://github.com/janhq/jan/releases/download/v0.4.7/jan-win-x64-0.4.7.exe.blockmap"
|
||||
@ -4311,7 +4311,7 @@
|
||||
"content_type": "text/xml",
|
||||
"state": "uploaded",
|
||||
"size": 110511,
|
||||
"download_count": 220,
|
||||
"download_count": 221,
|
||||
"created_at": "2023-12-15T14:19:41Z",
|
||||
"updated_at": "2023-12-15T14:19:42Z",
|
||||
"browser_download_url": "https://github.com/janhq/jan/releases/download/v0.4.2/jan-win-x64-0.4.2.exe.blockmap"
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
sidebar_position: 17
|
||||
sidebar_position: 18
|
||||
slug: /changelog/changelog-v0.2.0
|
||||
---
|
||||
# v0.2.0
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
sidebar_position: 16
|
||||
sidebar_position: 17
|
||||
slug: /changelog/changelog-v0.2.1
|
||||
---
|
||||
# v0.2.1
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
sidebar_position: 15
|
||||
sidebar_position: 16
|
||||
slug: /changelog/changelog-v0.2.2
|
||||
---
|
||||
# v0.2.2
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
sidebar_position: 14
|
||||
sidebar_position: 15
|
||||
slug: /changelog/changelog-v0.2.3
|
||||
---
|
||||
# v0.2.3
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
sidebar_position: 13
|
||||
sidebar_position: 14
|
||||
slug: /changelog/changelog-v0.3.0
|
||||
---
|
||||
# v0.3.0
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
sidebar_position: 12
|
||||
sidebar_position: 13
|
||||
slug: /changelog/changelog-v0.3.1
|
||||
---
|
||||
# v0.3.1
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
sidebar_position: 11
|
||||
sidebar_position: 12
|
||||
slug: /changelog/changelog-v0.3.2
|
||||
---
|
||||
# v0.3.2
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
sidebar_position: 10
|
||||
sidebar_position: 11
|
||||
slug: /changelog/changelog-v0.3.3
|
||||
---
|
||||
# v0.3.3
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
sidebar_position: 9
|
||||
sidebar_position: 10
|
||||
slug: /changelog/changelog-v0.4.0
|
||||
---
|
||||
# v0.4.0
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
sidebar_position: 8
|
||||
sidebar_position: 9
|
||||
slug: /changelog/changelog-v0.4.1
|
||||
---
|
||||
# v0.4.1
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
sidebar_position: 7
|
||||
sidebar_position: 8
|
||||
slug: /changelog/changelog-v0.4.2
|
||||
---
|
||||
# v0.4.2
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
sidebar_position: 6
|
||||
sidebar_position: 7
|
||||
slug: /changelog/changelog-v0.4.3
|
||||
---
|
||||
# v0.4.3
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
sidebar_position: 5
|
||||
sidebar_position: 6
|
||||
slug: /changelog/changelog-v0.4.4
|
||||
---
|
||||
# v0.4.4
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
sidebar_position: 4
|
||||
sidebar_position: 5
|
||||
slug: /changelog/changelog-v0.4.5
|
||||
---
|
||||
# v0.4.5
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
sidebar_position: 3
|
||||
sidebar_position: 4
|
||||
slug: /changelog/changelog-v0.4.6
|
||||
---
|
||||
# v0.4.6
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
sidebar_position: 2
|
||||
sidebar_position: 3
|
||||
slug: /changelog/changelog-v0.4.7
|
||||
---
|
||||
# v0.4.7
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
---
|
||||
sidebar_position: 1
|
||||
sidebar_position: 2
|
||||
slug: /changelog/changelog-v0.4.8
|
||||
---
|
||||
# v0.4.8
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user