--- title: Customize Engine Settings sidebar_position: 1 description: A step-by-step guide to change your engine's settings. keywords: [ Jan AI, Jan, ChatGPT alternative, local AI, private AI, conversational AI, no-subscription fee, large language model, import-models-manually, customize-engine-settings, ] --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; In this guide, we'll walk you through the process of customizing your engine settings by configuring the `nitro.json` file 1. Navigate to the `App Settings` > `Advanced` > `Open App Directory` > `~/jan/engine` folder. ```sh cd ~/jan/engines ``` ```sh C:/Users//jan/engines ``` ```sh cd ~/jan/engines ``` 2. Modify the `nitro.json` file based on your needs. The default settings are shown below. ```json title="~/jan/engines/nitro.json" { "ctx_len": 2048, "ngl": 100, "cpu_threads": 1, "cont_batching": false, "embedding": false } ``` The table below describes the parameters in the `nitro.json` file. | Parameter | Type | Description | | --------- | ---- | ----------- | | `ctx_len` | **Integer** | Typically set at `2048`, `ctx_len` provides ample context for model operations like `GPT-3.5`. (*Maximum*: `4096`, *Minimum*: `1`) | | `ngl` | **Integer** | Defaulted at `100`, `ngl` determines GPU layer usage. | | `cpu_threads` | **Integer** | Determines CPU inference threads, limited by hardware and OS. (*Maximum* determined by system) | | `cont_batching` | **Integer** | Controls continuous batching, enhancing throughput for LLM inference. | | `embedding` | **Integer** | Enables embedding utilization for tasks like document-enhanced chat in RAG-based applications. | :::tip - By default, the value of `ngl` is set to 100, which indicates that it will offload all. If you wish to offload only 50% of the GPU, you can set `ngl` to 15 because most models on Mistral or Llama are around ~ 30 layers. - To utilize the embedding feature, include the JSON parameter `"embedding": true`. It will enable Nitro to process inferences with embedding capabilities. Please refer to the [Embedding in the Nitro documentation](https://nitro.jan.ai/features/embed) for a more detailed explanation. - To utilize the continuous batching feature for boosting throughput and minimizing latency in large language model (LLM) inference, include `cont_batching: true`. For details, please refer to the [Continuous Batching in the Nitro documentation](https://nitro.jan.ai/features/cont-batch). ::: :::info[Assistance and Support] If you have questions, please join our [Discord community](https://discord.gg/Dt7MxDyNNZ) for support, updates, and discussions. :::