diff --git a/docs/src/pages/docs/local-engines/llama-cpp.mdx b/docs/src/pages/docs/local-engines/llama-cpp.mdx index 345b6bc86..df11c48ca 100644 --- a/docs/src/pages/docs/local-engines/llama-cpp.mdx +++ b/docs/src/pages/docs/local-engines/llama-cpp.mdx @@ -150,7 +150,7 @@ For detailed hardware compatibility, please visit our guide for [Mac](/docs/desk | **Flash Attention** | - Optimizes attention computation

- Reduces memory usage

- Recommended for most cases | Enabled | | **Caching** | - Enable to store recent prompts and responses

- Improves response time for repeated prompts | Enabled | | **KV Cache Type** | - KV cache implementation type; controls memory usage and precision trade-off

- Options:

• f16 (most stable)

• q8_0 (balanced)

• q4_0 (lowest memory) | f16 | -| **MMAP** | - Enables memory-mapped model loading

- Reduces memory usage

- Recommended for large models | Enabled | +| **mmap** | - Enables memory-mapped model loading

- Reduces memory usage

- Recommended for large models | Enabled | ## Best Practices