diff --git a/docs/docs/articles/nitro.md b/docs/docs/articles/nitro.md deleted file mode 100644 index 1117aa75c..000000000 --- a/docs/docs/articles/nitro.md +++ /dev/null @@ -1,93 +0,0 @@ ---- -sidebar_position: 1 -title: Why Nitro? The Inference Engine Behind Jan ---- - -Delve into Nitro, a robust inference engine that powers Jan. Nitro is a dedicated "inference server" crafted in C++, optimized for edge deployment, ensuring reliability and performance. - -⚡ Explore Nitro's codebase: [GitHub](https://github.com/janhq/nitro) - -## Problems of AI services - -Everyone wants to build their AI app, but they have a few challenges below. - -1. **No privacy** - -If you use an AI API like OpenAI ChatGPT, just say goodbye to privacy already. - -2. **Cumbersome integration** - -Let say you already have some interesting libraries for local AI, it's still very cumbersome to integrate. - -If you want to use something cutting edge like [llama-cpp](https://github.com/ggerganov/llama.cpp) you need to know a bit of CPP. - -And many other reaons. - -3. **Not standardized interface** - -Let say you solved the above points, you will still have issues with non-standard interface, you cannot re-use other projects code. - - -## Benefits of Using Nitro: - -1. **Compact Binary Size**: - - Nitro's efficient binary size: ~3mb compressed on average - ![](https://hackmd.io/_uploads/Sy_IYU8GT.png) - -2. **Streamlined Deployment**: - Nitro is designed for ease of deployment. Simply download and execute. Note the requirement for specific hardware dependencies, like CUDA. - -3. **User-Friendly Interface**: - Nitro offers a straightforward HTTP interface. With compatibility for multiple standard APIs, including OpenAI formats, integration is hassle-free. - - ```bash - curl --location 'http://localhost:3928/inferences/llamacpp/chat_completion' \ - --header 'Content-Type: application/json' \ - --header 'Accept: text/event-stream' \ - --header 'Access-Control-Allow-Origin: *' \ - --data '{ - "messages": [ - {"content": "Hello there 👋", "role": "assistant"}, - {"content": "Can you write a long story", "role": "user"} - ], - "stream": true, - "model": "gpt-3.5-turbo", - "max_tokens": 2000 - }' - ``` - -4. **Seperated process**: - Nitro operates independently, ensuring no interference with your main application processes. It self-manages, allowing you a focused development environment. - -5. **Broad Compatibility**: - Nitro supports a variety of platforms including Windows, MacOS, and Linux, and is compatible with arm64, x86, and NVIDIA GPUs, ensuring a versatile deployment experience. - -6. **Multi-threaded and performant by default*** - Built on DrogonCPP, a very fast CPP web framework in CPP. - -## Getting Started with Nitro: - -**Step 1: Obtain Nitro**: -Access Nitro binaries from the release page to begin. -🔗 [Download Nitro](https://github.com/janhq/nitro/releases) - -**Step 2: Source a Model**: -For those interested in the llama C++ integration, obtain a "GGUF" model from The Bloke's repository. -🔗 [Download Model](https://huggingface.co/TheBloke) - -**Step 3: Initialize Nitro**: -Launch Nitro and position your model using the following API call: - -```bash -curl -X POST 'http://localhost:3928/inferences/llamacpp/loadmodel' \ - -H 'Content-Type: application/json' \ - -d '{ - "llama_model_path": "/path/to/your_model.gguf", - "ctx_len": 2048, - "ngl": 100, - "embedding": true - }' -``` - -**Step 4: Engage with Nitro**: -Interact with Nitro to evaluate its capabilities. With its alignment to the OpenAI format, you can anticipate consistent and reliable output. diff --git a/docs/docs/getting-started/nitro.md b/docs/docs/getting-started/nitro.md new file mode 100644 index 000000000..2f44e1005 --- /dev/null +++ b/docs/docs/getting-started/nitro.md @@ -0,0 +1,71 @@ +--- +title: Nitro (C++ Inference Engine) +--- + +Nitro, is the inference engine that powers Jan. Nitro is written in C++, optimized for edge deployment. + +⚡ Explore Nitro's codebase: [GitHub](https://github.com/janhq/nitro) + +## Dependencies and Acknowledgements: + +- [llama.cpp](https://github.com/ggerganov/llama.cpp): Nitro wraps Llama.cpp, which runs Llama models in C++ +- [drogon](https://github.com/drogonframework/drogon): Nitro runs Drogon, which is a fast, C++17/20 HTTP application framework. +- (Coming soon) tensorrt-llm support for CUDA acceleration + +## Features + +In addition to the above features, Nitro also provides: + +- OpenAI compatibility +- HTTP interface with no bindings needed +- Runs as a separate process, not interfering with main app processes +- Multi-threaded server supporting concurrent users +- 1-click install +- No hardware dedendencies +- Ships as a small binary (~3mb compressed on average) +- Runs on Windows, MacOS, and Linux +- Compatible with arm64, x86, and NVIDIA GPUs + +## HTTP Interface + +Nitro offers a straightforward HTTP interface. With compatibility for multiple standard APIs, including OpenAI formats. + +```bash +curl --location 'http://localhost:3928/inferences/llamacpp/chat_completion' \ + --header 'Content-Type: application/json' \ + --header 'Accept: text/event-stream' \ + --header 'Access-Control-Allow-Origin: *' \ + --data '{ + "messages": [ + {"content": "Hello there 👋", "role": "assistant"}, + {"content": "Can you write a long story", "role": "user"} + ], + "stream": true, + "model": "gpt-3.5-turbo", + "max_tokens": 2000 + }' +``` + +## Using Nitro + +**Step 1: Obtain Nitro**: +Access Nitro binaries from the release page. +🔗 [Download Nitro](https://github.com/janhq/nitro/releases) + +**Step 2: Source a Model**: +For those interested in the llama C++ integration, obtain a "GGUF" model from The Bloke's repository. +🔗 [Download Model](https://huggingface.co/TheBloke) + +**Step 3: Initialize Nitro**: +Launch Nitro and position your model using the following API call: + +```bash +curl -X POST 'http://localhost:3928/inferences/llamacpp/loadmodel' \ + -H 'Content-Type: application/json' \ + -d '{ + "llama_model_path": "/path/to/your_model.gguf", + "ctx_len": 2048, + "ngl": 100, + "embedding": true + }' +``` diff --git a/docs/sidebars.js b/docs/sidebars.js index 0c9586601..cd5f7721b 100644 --- a/docs/sidebars.js +++ b/docs/sidebars.js @@ -63,7 +63,7 @@ const sidebars = { type: "category", label: "Reference", collapsible: true, - collapsed: true, + collapsed: false, items: [ { type: "autogenerated", @@ -83,31 +83,31 @@ const sidebars = { }, ], }, - { - type: "category", - label: "Tutorials", - collapsible: true, - collapsed: false, - items: [ - { - type: "autogenerated", - dirName: "tutorials", - }, - ], - }, - { - type: "category", - label: "Articles", - collapsible: true, - collapsed: false, - items: [ - { - type: "doc", - label: "Nitro", - id: "articles/nitro", - }, - ], - }, + // { + // type: "category", + // label: "Tutorials", + // collapsible: true, + // collapsed: false, + // items: [ + // { + // type: "autogenerated", + // dirName: "tutorials", + // }, + // ], + // }, + // { + // type: "category", + // label: "Articles", + // collapsible: true, + // collapsed: false, + // items: [ + // { + // type: "doc", + // label: "Nitro", + // id: "articles/nitro", + // }, + // ], + // }, ], companySidebar: [