diff --git a/docs/docs/guides/04-using-models/04-customize-engine-settings.mdx b/docs/docs/guides/04-using-models/04-customize-engine-settings.mdx new file mode 100644 index 000000000..8c3984a00 --- /dev/null +++ b/docs/docs/guides/04-using-models/04-customize-engine-settings.mdx @@ -0,0 +1,78 @@ +--- +title: Customize Engine Settings +slug: /guides/using-models/customize-engine-settings +description: Guide to integrate with a remote server. +keywords: + [ + Jan AI, + Jan, + ChatGPT alternative, + local AI, + private AI, + conversational AI, + no-subscription fee, + large language model, + import-models-manually, + customize-engine-settings, + ] +--- + +{/* Imports */} +import Tabs from "@theme/Tabs"; +import TabItem from "@theme/TabItem"; + +In this guide, we will show you how to customize the engine settings. + +1. Navigate to the `~/jan/engine` folder. You can find this folder by going to `App Settings` > `Advanced` > `Open App Directory`. + + + + + ```sh + cd ~/jan/engine + ``` + + + + + ```sh + C:/Users//jan/engine + ``` + + + + + ```sh + cd ~/jan/engine + ``` + + + + +2. Modify the `nitro.json` file based on your needs. The default settings are shown below. + +```json title="~/jan/engines/nitro.json" +{ + "ctx_len": 2048, + "ngl": 100, + "cpu_threads": 1, + "cont_batching": false, + "embedding": false +} +``` + +| Parameter | Type | Description | +| --------------- | ------- | ------------------------------------------------------------ | +| `ctx_len` | Integer | The context length for the model operations. | +| `ngl` | Integer | The number of GPU layers to use. | +| `cpu_threads` | Integer | The number of threads to use for inferencing (CPU mode only) | +| `cont_batching` | Boolean | Whether to use continuous batching. | +| `embedding` | Boolean | Whether to use embedding in the model. | + +:::tip + +- By default, the value of `ngl` is set to 100, which indicates that it will offload all. If you wish to offload only 50% of the GPU, you can set `ngl` to 15. This is because most models on Mistral or Llama is around ~ 30 layers. +- To utilize the embedding feature, include the JSON parameter `"embedding": true`. It will enable Nitro to process inferences with embedding capabilities. For a more detailed explanation, please refer to the [Embedding in the Nitro documentation](https://nitro.jan.ai/features/embed). +- To utilize the continuous batching feature to boost throughput and minimize latency in large language model (LLM) inference, please refer to the [Continuous Batching in the Nitro documentation](https://nitro.jan.ai/features/cont-batch). + +:::