diff --git a/docs/docs/guides/internal.md b/docs/docs/guides/internal.md new file mode 100644 index 000000000..095e2437f --- /dev/null +++ b/docs/docs/guides/internal.md @@ -0,0 +1,115 @@ +Connect to rigs +Download Pritunl +https://client.pritunl.com/#install + +Import the .ovpn file + +Use Vscode to connect +Hint: You need to install "Remote-SSH" extension. + + + +Llama.cpp + +Get llama.cpp +` +git clone https://github.com/ggerganov/llama.cpp +cd llama.cpp +` + +Build with cmake for faster result +` +mkdir build +cd build +# You can play with the params to find the best out of it +cmake .. -DLLAMA_CUBLAS=ON -DLLAMA_CUDA_F16=ON -DLLAMA_CUDA_MMV_Y=8 +cmake --build . --config Release +` + +Download model +` +# Back to llama.cpp +cd .. +cd models +# This will get the llama-7b-Q8 GGUF +wget https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q8_0.gguf +` + +` +# Back to llama.cpp +` +cd llama.cpp/build/bin/ +./main -m ./models/llama-2-7b.Q8_0.gguf -p "Writing a thesis proposal can be done in 10 simple steps:\nStep 1:" -n 2048 -e -ngl 100 -t 48 +` + + + +Tensorrt-LLM + +The following command creates a Docker image for development: + +` +sudo make -C docker build +` + +Check docker images command: +` +docker images +` + +The image will be tagged locally with tensorrt_llm/devel:latest. To run the container, use the following command: +` +sudo make -C docker run +` + +Build TensorRT-LLM +Once in the container, TensorRT-LLM can be built from source using: +` +# To build the TensorRT-LLM code. +python3 ./scripts/build_wheel.py --trt_root /usr/local/tensorrt + +# Deploy TensorRT-LLM in your environment. +pip install ./build/tensorrt_llm*.whl +` + +It is possible to restrict the compilation of TensorRT-LLM to specific CUDA architectures. For that purpose, the build_wheel.py script accepts a semicolon separated list of CUDA architecture as shown in the following example: + +# Build TensorRT-LLM for Ada (4090) +` +python3 ./scripts/build_wheel.py --cuda_architectures "89-real;90-real" +` + +The list of supported architectures can be found in the CMakeLists.txt file. + +Run Tensorrt-LLM +` +pip install -r examples/bloom/requirements.txt +git lfs install +` + +Download llama weight +` +cd examples/llama +rm -rf ./llama/7B +mkdir -p ./llama/7B && git clone https://huggingface.co/NousResearch/Llama-2-7b-hf ./llama/7B +` + +Build the engine with Single GPU on Llama 7B +` +python build.py --model_dir ./llama/7B/ \ + --dtype float16 \ + --remove_input_padding \ + --use_gpt_attention_plugin float16 \ + --enable_context_fmha \ + --use_gemm_plugin float16 \ + --use_weight_only \ + --output_dir ./llama/7B/trt_engines/weight_only/1-gpu/ +` + +Run inference. Use custom `run.py` to check the tokens/seconds +` +python3 run.py --max_output_len=2048 \ + --tokenizer_dir ./llama/7B/ \ + --engine_dir=./llama/7B/trt_engines/weight_only/1-gpu/ + --input_text Writing a thesis proposal can be done in 10 simple steps:\nStep 1: +` \ No newline at end of file diff --git a/docs/docs/guides/linux.md b/docs/docs/guides/linux.md index ba26bf6f4..99c3726bb 100644 --- a/docs/docs/guides/linux.md +++ b/docs/docs/guides/linux.md @@ -11,6 +11,19 @@ To begin using 👋Jan.ai on your Windows computer, follow these steps: ![Jan Installer](img/jan-download.png) +> Note: For faster results, you should enable your NVIDIA GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager or from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads). +```bash +apt install nvidia-cuda-toolkit +``` + +> Check the installation by + +```bash +nvidia-smi +``` + +> For AMD GPU. You can download it from your Linux distro's package manager or from here: [ROCm Quick Start (Linux)](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html). + ## Step 2: Download your first model Now, let's get your first model: diff --git a/docs/docs/guides/windows.md b/docs/docs/guides/windows.md index c7eb6b1a0..f143a4c3c 100644 --- a/docs/docs/guides/windows.md +++ b/docs/docs/guides/windows.md @@ -23,6 +23,16 @@ When you run the Jan Installer, Windows Defender may display a warning. Here's w ![Setting up](img/set-up.png) +> Note: For faster results, you should enable your NVIDIA GPU. Make sure to have the CUDA toolkit installed. You can download it from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) or [CUDA Installation guide](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html#verify-you-have-a-cuda-capable-gpu). + +> Check the installation by + +```bash +nvidia-smi +``` + +> For AMD GPU, you should use [WSLv2](https://learn.microsoft.com/en-us/windows/wsl/install). You can download it from here: [ROCm Quick Start (Linux)](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html). + ## Step 3: Download your first model Now, let's get your first model: