docs: finalize customize engine settings
This commit is contained in:
parent
f467aea85c
commit
17172db6bc
@ -71,7 +71,7 @@ In this guide, we will show you how to customize the engine settings.
|
||||
|
||||
:::tip
|
||||
|
||||
- By default, the value of `ngl` is set to 100, which indicates that it will offload all. If you wish to offload only 50% of the GPU, you can set `ngl` to 15. This is because most models on Mistral or Llama is around ~ 30 layers.
|
||||
- By default, the value of `ngl` is set to 100, which indicates that it will offload all. If you wish to offload only 50% of the GPU, you can set `ngl` to 15. Because the majority of models on Mistral or Llama are around ~ 30 layers.
|
||||
- To utilize the embedding feature, include the JSON parameter `"embedding": true`. It will enable Nitro to process inferences with embedding capabilities. For a more detailed explanation, please refer to the [Embedding in the Nitro documentation](https://nitro.jan.ai/features/embed).
|
||||
- To utilize the continuous batching feature to boost throughput and minimize latency in large language model (LLM) inference, please refer to the [Continuous Batching in the Nitro documentation](https://nitro.jan.ai/features/cont-batch).
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user