update the GPU installation

2023-11-01 11:35:54 +07:00 · 2023-11-01 11:35:54 +07:00 · dd1812807a
commit dd1812807a
parent bec7b11cdc
3 changed files with 138 additions and 0 deletions
--- a/docs/docs/guides/internal.md
+++ b/docs/docs/guides/internal.md
@ -0,0 +1,115 @@
 Connect to rigs
 Download Pritunl
 https://client.pritunl.com/#install
 Import the .ovpn file
 Use Vscode to connect
 Hint: You need to install "Remote-SSH" extension.
 Llama.cpp
 Get llama.cpp
 `
 git clone https://github.com/ggerganov/llama.cpp
 cd llama.cpp
 `
 Build with cmake for faster result
 `
 mkdir build
 cd build
 # You can play with the params to find the best out of it
 cmake .. -DLLAMA_CUBLAS=ON -DLLAMA_CUDA_F16=ON -DLLAMA_CUDA_MMV_Y=8
 cmake --build . --config Release
 `
 Download model
 `
 # Back to llama.cpp
 cd ..
 cd models
 # This will get the llama-7b-Q8 GGUF
 wget https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q8_0.gguf
 `
 `
 # Back to llama.cpp
 `
 cd llama.cpp/build/bin/
 ./main -m ./models/llama-2-7b.Q8_0.gguf -p "Writing a thesis proposal can be done in 10 simple steps:\nStep 1:" -n 2048 -e -ngl 100 -t 48
 `
 Tensorrt-LLM
 The following command creates a Docker image for development:
 `
 sudo make -C docker build
 `
 Check docker images command:
 `
 docker images
 `
 The image will be tagged locally with tensorrt_llm/devel:latest. To run the container, use the following command:
 `
 sudo make -C docker run
 `
 Build TensorRT-LLM
 Once in the container, TensorRT-LLM can be built from source using:
 `
 # To build the TensorRT-LLM code.
 python3 ./scripts/build_wheel.py --trt_root /usr/local/tensorrt
 # Deploy TensorRT-LLM in your environment.
 pip install ./build/tensorrt_llm*.whl
 `
 It is possible to restrict the compilation of TensorRT-LLM to specific CUDA architectures. For that purpose, the build_wheel.py script accepts a semicolon separated list of CUDA architecture as shown in the following example:
 # Build TensorRT-LLM for Ada (4090)
 `
 python3 ./scripts/build_wheel.py --cuda_architectures "89-real;90-real"
 `
 The list of supported architectures can be found in the CMakeLists.txt file.
 Run Tensorrt-LLM
 `
 pip install -r examples/bloom/requirements.txt
 git lfs install
 `
 Download llama weight
 `
 cd examples/llama
 rm -rf ./llama/7B
 mkdir -p ./llama/7B && git clone https://huggingface.co/NousResearch/Llama-2-7b-hf ./llama/7B
 `
 Build the engine with Single GPU on Llama 7B
 `
 python build.py --model_dir ./llama/7B/ \
                --dtype float16 \
                --remove_input_padding \
                --use_gpt_attention_plugin float16 \
                --enable_context_fmha \
                --use_gemm_plugin float16 \
                --use_weight_only \
                --output_dir ./llama/7B/trt_engines/weight_only/1-gpu/
 `
 Run inference. Use custom `run.py` to check the tokens/seconds
 `
 python3 run.py --max_output_len=2048 \
               --tokenizer_dir ./llama/7B/ \
               --engine_dir=./llama/7B/trt_engines/weight_only/1-gpu/
               --input_text Writing a thesis proposal can be done in 10 simple steps:\nStep 1:
 `
--- a/docs/docs/guides/linux.md
+++ b/docs/docs/guides/linux.md
@ -11,6 +11,19 @@ To begin using 👋Jan.ai on your Windows computer, follow these steps:
 ![Jan Installer](img/jan-download.png)
 > Note: For faster results, you should enable your NVIDIA GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager or from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads).
 ```bash
 apt install nvidia-cuda-toolkit
 ```
 > Check the installation by
 ```bash
 nvidia-smi
 ```
 > For AMD GPU. You can download it from your Linux distro's package manager or from here: [ROCm Quick Start (Linux)](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html).
 ## Step 2: Download your first model
 Now, let's get your first model:
--- a/docs/docs/guides/windows.md
+++ b/docs/docs/guides/windows.md
@ -23,6 +23,16 @@ When you run the Jan Installer, Windows Defender may display a warning. Here's w
 ![Setting up](img/set-up.png)
 > Note: For faster results, you should enable your NVIDIA GPU. Make sure to have the CUDA toolkit installed. You can download it from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) or [CUDA Installation guide](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html#verify-you-have-a-cuda-capable-gpu).
 > Check the installation by
 ```bash
 nvidia-smi
 ```
 > For AMD GPU, you should use [WSLv2](https://learn.microsoft.com/en-us/windows/wsl/install). You can download it from here: [ROCm Quick Start (Linux)](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html).
 ## Step 3: Download your first model
 Now, let's get your first model: