fix: Add latest result on 3090/ 4090

This commit is contained in:
hiro 2024-03-20 18:51:34 +07:00
parent 70e10fcc4a
commit c885d59c0b

View File

@ -42,26 +42,26 @@ TensorRT-LLM handily outperformed llama.cpp in for the 4090s. Interestingly,
- CPU: Intel 13th series
- GPU: NVIDIA GPU 4090 (Ampere - sm 86)
- RAM: 120GB
- OS: Windows
- RAM: 32GB
- OS: Windows 11 Pro
#### TinyLlama-1.1b q4
#### TinyLlama-1.1b FP16
| Metrics | GGUF (using the GPU) | TensorRT-LLM |
| -------------------- | -------------------- | ------------ |
| Throughput (token/s) | 104 | ✅ 131 |
| VRAM Used (GB) | 2.1 | 😱 21.5 |
| RAM Used (GB) | 0.3 | 😱 15 |
| Disk Size (GB) | 4.07 | 4.07 |
| Throughput (token/s) | No support | ✅ 257.76 |
| VRAM Used (GB) | No support | 3.3 |
| RAM Used (GB) | No support | 0.54 |
| Disk Size (GB) | No support | 2 |
#### Mistral-7b int4
| Metrics | GGUF (using the GPU) | TensorRT-LLM |
| -------------------- | -------------------- | ------------ |
| Throughput (token/s) | 80 | ✅ 97.9 |
| VRAM Used (GB) | 2.1 | 😱 23.5 |
| RAM Used (GB) | 0.3 | 😱 15 |
| Disk Size (GB) | 4.07 | 4.07 |
| Throughput (token/s) | 101.3 | ✅ 159 |
| VRAM Used (GB) | 5.5 | 6.3 |
| RAM Used (GB) | 0.54 | 0.42 |
| Disk Size (GB) | 4.07 | 3.66 |
### RTX 3090 on Windows PC
@ -70,23 +70,23 @@ TensorRT-LLM handily outperformed llama.cpp in for the 4090s. Interestingly,
- RAM: 64GB
- OS: Windows
#### TinyLlama-1.1b q4
#### TinyLlama-1.1b FP16
| Metrics | GGUF (using the GPU) | TensorRT-LLM |
| -------------------- | -------------------- | ------------ |
| Throughput (token/s) | 131.28 | ✅ 194 |
| VRAM Used (GB) | 2.1 | 😱 21.5 |
| RAM Used (GB) | 0.3 | 😱 15 |
| Disk Size (GB) | 4.07 | 4.07 |
| Throughput (token/s) | No support | ✅ 203 |
| VRAM Used (GB) | No support | 3.8 |
| RAM Used (GB) | No support | 0.54 |
| Disk Size (GB) | No support | 2 |
#### Mistral-7b int4
| Metrics | GGUF (using the GPU) | TensorRT-LLM |
| -------------------- | -------------------- | ------------ |
| Throughput (token/s) | 88 | ✅ 137 |
| VRAM Used (GB) | 6.0 | 😱 23.8 |
| RAM Used (GB) | 0.3 | 😱 25 |
| Disk Size (GB) | 4.07 | 4.07 |
| Throughput (token/s) | 90 | 140.27 |
| VRAM Used (GB) | 6.0 | 6.8 |
| RAM Used (GB) | 0.54 | 0.42 |
| Disk Size (GB) | 4.07 | 3.66 |
### RTX 4060 on Windows Laptop
@ -95,7 +95,7 @@ TensorRT-LLM handily outperformed llama.cpp in for the 4090s. Interestingly,
- RAM: 16GB
- GPU: NVIDIA Laptop GPU 4060 (Ada)
#### TinyLlama-1.1b q4
#### TinyLlama-1.1b FP16
| Metrics | GGUF (using the GPU) | TensorRT-LLM |
| -------------------- | -------------------- | ------------ |