fix: Add latest result on 3090/ 4090

2024-03-20 18:51:34 +07:00 · 2024-03-20 18:51:34 +07:00 · c885d59c0b
commit c885d59c0b
parent 70e10fcc4a
1 changed files with 27 additions and 27 deletions
--- a/docs/blog/2024-03-19-TensorRT-LLM.md
+++ b/docs/blog/2024-03-19-TensorRT-LLM.md
@ -42,26 +42,26 @@ TensorRT-LLM handily outperformed llama.cpp in for the 4090s. Interestingly,

 - CPU: Intel 13th series
 - GPU: NVIDIA GPU 4090 (Ampere - sm 86)
- RAM: 120GB
- OS: Windows
+- RAM: 32GB
+- OS: Windows 11 Pro

-#### TinyLlama-1.1b q4
+#### TinyLlama-1.1b FP16

 | Metrics              | GGUF (using the GPU) | TensorRT-LLM |
 | -------------------- | -------------------- | ------------ |
-| Throughput (token/s) | 104                  | ✅ 131       |
-| VRAM Used (GB)       | 2.1                  | 😱 21.5      |
-| RAM Used (GB)        | 0.3                  | 😱 15        |
-| Disk Size (GB)       | 4.07                 | 4.07         |
+| Throughput (token/s) | No support           | ✅ 257.76    |
+| VRAM Used (GB)       | No support           | 3.3          |
+| RAM Used (GB)        | No support           | 0.54         |
+| Disk Size (GB)       | No support           | 2            |

 #### Mistral-7b int4

 | Metrics              | GGUF (using the GPU) | TensorRT-LLM |
 | -------------------- | -------------------- | ------------ |
-| Throughput (token/s) | 80                   | ✅ 97.9      |
-| VRAM Used (GB)       | 2.1                  | 😱 23.5      |
-| RAM Used (GB)        | 0.3                  | 😱 15        |
-| Disk Size (GB)       | 4.07                 | 4.07         |
+| Throughput (token/s) | 101.3                | ✅ 159       |
+| VRAM Used (GB)       | 5.5                  | 6.3          |
+| RAM Used (GB)        | 0.54                 | 0.42         |
+| Disk Size (GB)       | 4.07                 | 3.66         |

 ### RTX 3090 on Windows PC

@ -70,23 +70,23 @@ TensorRT-LLM handily outperformed llama.cpp in for the 4090s. Interestingly,
 - RAM: 64GB
 - OS: Windows

-#### TinyLlama-1.1b q4
+#### TinyLlama-1.1b FP16

 | Metrics              | GGUF (using the GPU) | TensorRT-LLM |
 | -------------------- | -------------------- | ------------ |
-| Throughput (token/s) | 131.28               | ✅ 194       |
-| VRAM Used (GB)       | 2.1                  | 😱 21.5      |
-| RAM Used (GB)        | 0.3                  | 😱 15        |
-| Disk Size (GB)       | 4.07                 | 4.07         |
+| Throughput (token/s) | No support           | ✅ 203       |
+| VRAM Used (GB)       | No support           | 3.8          |
+| RAM Used (GB)        | No support           | 0.54         |
+| Disk Size (GB)       | No support           | 2            |

 #### Mistral-7b int4

 | Metrics              | GGUF (using the GPU) | TensorRT-LLM |
 | -------------------- | -------------------- | ------------ |
-| Throughput (token/s) | 88                   | ✅ 137       |
-| VRAM Used (GB)       | 6.0                  | 😱 23.8      |
-| RAM Used (GB)        | 0.3                  | 😱 25        |
-| Disk Size (GB)       | 4.07                 | 4.07         |
+| Throughput (token/s) | 90                   | 140.27       |
+| VRAM Used (GB)       | 6.0                  | 6.8          |
+| RAM Used (GB)        | 0.54                 | 0.42         |
+| Disk Size (GB)       | 4.07                 | 3.66         |

 ### RTX 4060 on Windows Laptop

@ -95,7 +95,7 @@ TensorRT-LLM handily outperformed llama.cpp in for the 4090s. Interestingly,
 - RAM: 16GB
 - GPU: NVIDIA Laptop GPU 4060 (Ada)

-#### TinyLlama-1.1b q4
+#### TinyLlama-1.1b FP16

 | Metrics              | GGUF (using the GPU) | TensorRT-LLM |
 | -------------------- | -------------------- | ------------ |