update the GPU installation
This commit is contained in:
parent
bec7b11cdc
commit
dd1812807a
115
docs/docs/guides/internal.md
Normal file
115
docs/docs/guides/internal.md
Normal file
@ -0,0 +1,115 @@
|
|||||||
|
Connect to rigs
|
||||||
|
Download Pritunl
|
||||||
|
https://client.pritunl.com/#install
|
||||||
|
|
||||||
|
Import the .ovpn file
|
||||||
|
|
||||||
|
Use Vscode to connect
|
||||||
|
Hint: You need to install "Remote-SSH" extension.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Llama.cpp
|
||||||
|
|
||||||
|
Get llama.cpp
|
||||||
|
`
|
||||||
|
git clone https://github.com/ggerganov/llama.cpp
|
||||||
|
cd llama.cpp
|
||||||
|
`
|
||||||
|
|
||||||
|
Build with cmake for faster result
|
||||||
|
`
|
||||||
|
mkdir build
|
||||||
|
cd build
|
||||||
|
# You can play with the params to find the best out of it
|
||||||
|
cmake .. -DLLAMA_CUBLAS=ON -DLLAMA_CUDA_F16=ON -DLLAMA_CUDA_MMV_Y=8
|
||||||
|
cmake --build . --config Release
|
||||||
|
`
|
||||||
|
|
||||||
|
Download model
|
||||||
|
`
|
||||||
|
# Back to llama.cpp
|
||||||
|
cd ..
|
||||||
|
cd models
|
||||||
|
# This will get the llama-7b-Q8 GGUF
|
||||||
|
wget https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q8_0.gguf
|
||||||
|
`
|
||||||
|
|
||||||
|
`
|
||||||
|
# Back to llama.cpp
|
||||||
|
`
|
||||||
|
cd llama.cpp/build/bin/
|
||||||
|
./main -m ./models/llama-2-7b.Q8_0.gguf -p "Writing a thesis proposal can be done in 10 simple steps:\nStep 1:" -n 2048 -e -ngl 100 -t 48
|
||||||
|
`
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Tensorrt-LLM
|
||||||
|
|
||||||
|
The following command creates a Docker image for development:
|
||||||
|
|
||||||
|
`
|
||||||
|
sudo make -C docker build
|
||||||
|
`
|
||||||
|
|
||||||
|
Check docker images command:
|
||||||
|
`
|
||||||
|
docker images
|
||||||
|
`
|
||||||
|
|
||||||
|
The image will be tagged locally with tensorrt_llm/devel:latest. To run the container, use the following command:
|
||||||
|
`
|
||||||
|
sudo make -C docker run
|
||||||
|
`
|
||||||
|
|
||||||
|
Build TensorRT-LLM
|
||||||
|
Once in the container, TensorRT-LLM can be built from source using:
|
||||||
|
`
|
||||||
|
# To build the TensorRT-LLM code.
|
||||||
|
python3 ./scripts/build_wheel.py --trt_root /usr/local/tensorrt
|
||||||
|
|
||||||
|
# Deploy TensorRT-LLM in your environment.
|
||||||
|
pip install ./build/tensorrt_llm*.whl
|
||||||
|
`
|
||||||
|
|
||||||
|
It is possible to restrict the compilation of TensorRT-LLM to specific CUDA architectures. For that purpose, the build_wheel.py script accepts a semicolon separated list of CUDA architecture as shown in the following example:
|
||||||
|
|
||||||
|
# Build TensorRT-LLM for Ada (4090)
|
||||||
|
`
|
||||||
|
python3 ./scripts/build_wheel.py --cuda_architectures "89-real;90-real"
|
||||||
|
`
|
||||||
|
|
||||||
|
The list of supported architectures can be found in the CMakeLists.txt file.
|
||||||
|
|
||||||
|
Run Tensorrt-LLM
|
||||||
|
`
|
||||||
|
pip install -r examples/bloom/requirements.txt
|
||||||
|
git lfs install
|
||||||
|
`
|
||||||
|
|
||||||
|
Download llama weight
|
||||||
|
`
|
||||||
|
cd examples/llama
|
||||||
|
rm -rf ./llama/7B
|
||||||
|
mkdir -p ./llama/7B && git clone https://huggingface.co/NousResearch/Llama-2-7b-hf ./llama/7B
|
||||||
|
`
|
||||||
|
|
||||||
|
Build the engine with Single GPU on Llama 7B
|
||||||
|
`
|
||||||
|
python build.py --model_dir ./llama/7B/ \
|
||||||
|
--dtype float16 \
|
||||||
|
--remove_input_padding \
|
||||||
|
--use_gpt_attention_plugin float16 \
|
||||||
|
--enable_context_fmha \
|
||||||
|
--use_gemm_plugin float16 \
|
||||||
|
--use_weight_only \
|
||||||
|
--output_dir ./llama/7B/trt_engines/weight_only/1-gpu/
|
||||||
|
`
|
||||||
|
|
||||||
|
Run inference. Use custom `run.py` to check the tokens/seconds
|
||||||
|
`
|
||||||
|
python3 run.py --max_output_len=2048 \
|
||||||
|
--tokenizer_dir ./llama/7B/ \
|
||||||
|
--engine_dir=./llama/7B/trt_engines/weight_only/1-gpu/
|
||||||
|
--input_text Writing a thesis proposal can be done in 10 simple steps:\nStep 1:
|
||||||
|
`
|
||||||
@ -11,6 +11,19 @@ To begin using 👋Jan.ai on your Windows computer, follow these steps:
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
> Note: For faster results, you should enable your NVIDIA GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager or from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads).
|
||||||
|
```bash
|
||||||
|
apt install nvidia-cuda-toolkit
|
||||||
|
```
|
||||||
|
|
||||||
|
> Check the installation by
|
||||||
|
|
||||||
|
```bash
|
||||||
|
nvidia-smi
|
||||||
|
```
|
||||||
|
|
||||||
|
> For AMD GPU. You can download it from your Linux distro's package manager or from here: [ROCm Quick Start (Linux)](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html).
|
||||||
|
|
||||||
## Step 2: Download your first model
|
## Step 2: Download your first model
|
||||||
Now, let's get your first model:
|
Now, let's get your first model:
|
||||||
|
|
||||||
|
|||||||
@ -23,6 +23,16 @@ When you run the Jan Installer, Windows Defender may display a warning. Here's w
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
> Note: For faster results, you should enable your NVIDIA GPU. Make sure to have the CUDA toolkit installed. You can download it from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) or [CUDA Installation guide](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html#verify-you-have-a-cuda-capable-gpu).
|
||||||
|
|
||||||
|
> Check the installation by
|
||||||
|
|
||||||
|
```bash
|
||||||
|
nvidia-smi
|
||||||
|
```
|
||||||
|
|
||||||
|
> For AMD GPU, you should use [WSLv2](https://learn.microsoft.com/en-us/windows/wsl/install). You can download it from here: [ROCm Quick Start (Linux)](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html).
|
||||||
|
|
||||||
## Step 3: Download your first model
|
## Step 3: Download your first model
|
||||||
Now, let's get your first model:
|
Now, let's get your first model:
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user