update the GPU installation

This commit is contained in:
hahuyhoang411 2023-11-01 11:35:54 +07:00
parent bec7b11cdc
commit dd1812807a
3 changed files with 138 additions and 0 deletions

View File

@ -0,0 +1,115 @@
Connect to rigs
Download Pritunl
https://client.pritunl.com/#install
Import the .ovpn file
Use Vscode to connect
Hint: You need to install "Remote-SSH" extension.
Llama.cpp
Get llama.cpp
`
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
`
Build with cmake for faster result
`
mkdir build
cd build
# You can play with the params to find the best out of it
cmake .. -DLLAMA_CUBLAS=ON -DLLAMA_CUDA_F16=ON -DLLAMA_CUDA_MMV_Y=8
cmake --build . --config Release
`
Download model
`
# Back to llama.cpp
cd ..
cd models
# This will get the llama-7b-Q8 GGUF
wget https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q8_0.gguf
`
`
# Back to llama.cpp
`
cd llama.cpp/build/bin/
./main -m ./models/llama-2-7b.Q8_0.gguf -p "Writing a thesis proposal can be done in 10 simple steps:\nStep 1:" -n 2048 -e -ngl 100 -t 48
`
Tensorrt-LLM
The following command creates a Docker image for development:
`
sudo make -C docker build
`
Check docker images command:
`
docker images
`
The image will be tagged locally with tensorrt_llm/devel:latest. To run the container, use the following command:
`
sudo make -C docker run
`
Build TensorRT-LLM
Once in the container, TensorRT-LLM can be built from source using:
`
# To build the TensorRT-LLM code.
python3 ./scripts/build_wheel.py --trt_root /usr/local/tensorrt
# Deploy TensorRT-LLM in your environment.
pip install ./build/tensorrt_llm*.whl
`
It is possible to restrict the compilation of TensorRT-LLM to specific CUDA architectures. For that purpose, the build_wheel.py script accepts a semicolon separated list of CUDA architecture as shown in the following example:
# Build TensorRT-LLM for Ada (4090)
`
python3 ./scripts/build_wheel.py --cuda_architectures "89-real;90-real"
`
The list of supported architectures can be found in the CMakeLists.txt file.
Run Tensorrt-LLM
`
pip install -r examples/bloom/requirements.txt
git lfs install
`
Download llama weight
`
cd examples/llama
rm -rf ./llama/7B
mkdir -p ./llama/7B && git clone https://huggingface.co/NousResearch/Llama-2-7b-hf ./llama/7B
`
Build the engine with Single GPU on Llama 7B
`
python build.py --model_dir ./llama/7B/ \
--dtype float16 \
--remove_input_padding \
--use_gpt_attention_plugin float16 \
--enable_context_fmha \
--use_gemm_plugin float16 \
--use_weight_only \
--output_dir ./llama/7B/trt_engines/weight_only/1-gpu/
`
Run inference. Use custom `run.py` to check the tokens/seconds
`
python3 run.py --max_output_len=2048 \
--tokenizer_dir ./llama/7B/ \
--engine_dir=./llama/7B/trt_engines/weight_only/1-gpu/
--input_text Writing a thesis proposal can be done in 10 simple steps:\nStep 1:
`

View File

@ -11,6 +11,19 @@ To begin using 👋Jan.ai on your Windows computer, follow these steps:
![Jan Installer](img/jan-download.png) ![Jan Installer](img/jan-download.png)
> Note: For faster results, you should enable your NVIDIA GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager or from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads).
```bash
apt install nvidia-cuda-toolkit
```
> Check the installation by
```bash
nvidia-smi
```
> For AMD GPU. You can download it from your Linux distro's package manager or from here: [ROCm Quick Start (Linux)](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html).
## Step 2: Download your first model ## Step 2: Download your first model
Now, let's get your first model: Now, let's get your first model:

View File

@ -23,6 +23,16 @@ When you run the Jan Installer, Windows Defender may display a warning. Here's w
![Setting up](img/set-up.png) ![Setting up](img/set-up.png)
> Note: For faster results, you should enable your NVIDIA GPU. Make sure to have the CUDA toolkit installed. You can download it from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) or [CUDA Installation guide](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html#verify-you-have-a-cuda-capable-gpu).
> Check the installation by
```bash
nvidia-smi
```
> For AMD GPU, you should use [WSLv2](https://learn.microsoft.com/en-us/windows/wsl/install). You can download it from here: [ROCm Quick Start (Linux)](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html).
## Step 3: Download your first model ## Step 3: Download your first model
Now, let's get your first model: Now, let's get your first model: