jan/docs/docs/guides/04-using-models/05-customize-engine-settings.mdx

---
title: Customize Engine Settings
slug: /guides/using-models/customize-engine-settings
description: Guide to customize engine settings.
keywords:
  [
    Jan AI,
    Jan,
    ChatGPT alternative,
    local AI,
    private AI,
    conversational AI,
    no-subscription fee,
    large language model,
    import-models-manually,
    customize-engine-settings,
  ]
---

{/* Imports */}
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";

In this guide, we will show you how to customize the engine settings.

1. Navigate to the `~/jan/engine` folder. You can find this folder by going to `App Settings` > `Advanced` > `Open App Directory`.

<Tabs groupId="operating-systems">
  <TabItem value="mac" label="macOS">

    ```sh
    cd ~/jan/engine
    ```

  </TabItem>
  <TabItem value="win" label="Windows">

    ```sh
    C:/Users/<your_user_name>/jan/engine
    ```

  </TabItem>
  <TabItem value="linux" label="Linux">

    ```sh
    cd ~/jan/engine
    ```

  </TabItem>
</Tabs>

2. Modify the `nitro.json` file based on your needs. The default settings are shown below.

```json title="~/jan/engines/nitro.json"
{
  "ctx_len": 2048,
  "ngl": 100,
  "cpu_threads": 1,
  "cont_batching": false,
  "embedding": false
}
```

The table below describes the parameters in the `nitro.json` file.

| Parameter       | Type    | Description                                                  |
| --------------- | ------- | ------------------------------------------------------------ |
| `ctx_len`       | Integer | The context length for the model operations.                 |
| `ngl`           | Integer | The number of GPU layers to use.                             |
| `cpu_threads`   | Integer | The number of threads to use for inferencing (CPU mode only) |
| `cont_batching` | Boolean | Whether to use continuous batching.                          |
| `embedding`     | Boolean | Whether to use embedding in the model.                       |

:::tip

- By default, the value of `ngl` is set to 100, which indicates that it will offload all. If you wish to offload only 50% of the GPU, you can set `ngl` to 15. Because the majority of models on Mistral or Llama are around ~ 30 layers.
- To utilize the embedding feature, include the JSON parameter `"embedding": true`. It will enable Nitro to process inferences with embedding capabilities. For a more detailed explanation, please refer to the [Embedding in the Nitro documentation](https://nitro.jan.ai/features/embed).
- To utilize the continuous batching feature to boost throughput and minimize latency in large language model (LLM) inference, please refer to the [Continuous Batching in the Nitro documentation](https://nitro.jan.ai/features/cont-batch).

:::