Merge pull request #488 from janhq/add-guides

Add guides
This commit is contained in:
Daniel 2023-11-04 22:54:44 +08:00 committed by GitHub
commit 1b7c6447c9
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
56 changed files with 3973 additions and 227 deletions

6
docs/docs/about/about.md Normal file
View File

@ -0,0 +1,6 @@
---
title: About Jan
---
- Mission, Vision, etc
- COSS Model

View File

@ -0,0 +1,202 @@
---
title: Anatomy of 👋Jan
---
This page explains all the architecture of [Jan](https://Jan/).
## Synchronous architecture
![Synchronous architecture](../img/arch-async.drawio.png)
### Overview
The architecture of the Jan application is designed to provide a seamless experience for the users while also being modular and extensible.
### BackEnd and FrontEnd
**BackEnd:**
- The BackEnd serves as the brain of the application. It processes the information, performs computations, and manages the main logic of the system.
:::info
This is like an [OS (Operating System)](https://en.wikipedia.org/wiki/Operating_system) in the computer.
:::
**FrontEnd:**
- The FrontEnd is the interface that users interact with. It takes user inputs, displays results, and communicates with the BackEnd through Inter-process communication bi-directionally.
:::info
This is like [VSCode](https://code.visualstudio.com/) application
:::
**Inter-process communication:**
- A mechanism that allows the BackEnd and FrontEnd to communicate in real time. It ensures that data flows smoothly between the two, facilitating rapid response and dynamic updates.
### Plugins and Apps
**Plugins:**
In Jan, Plugins contains all the core features. They could be Core Plugins or [Nitro](https://github.com/janhq/nitro)
- **Load:** This denotes the initialization and activation of a plugin when the application starts or when a user activates it.
- **Implement:** This is where the main functionality of the plugin resides. Developers code the desired features and functionalities here. This is a "call to action" feature.
- **Dispose:** After the plugin's task is completed or deactivated, this function ensures that it releases any resources it uses, providing optimal performance and preventing memory leaks.
:::info
This is like [Extensions](https://marketplace.visualstudio.com/VSCode) in VSCode.
:::
**Apps:**
Apps are basically Plugin-like. However, Apps can be built by users for their own purposes.
> For example, users can build a `Personal Document RAG App` to chat with specific documents or articles.
With **Plugins and Apps**, users can build a broader ecosystem surrounding Jan.
## Asynchronous architecture
![Asynchronous architecture](../img/arch-async.drawio.png)
### Overview
The asynchronous architecture allows Jan to handle multiple operations simultaneously without waiting for one to complete before starting another. This results in a more efficient and responsive user experience. The provided diagram breaks down the primary components and their interactions.
### Components
#### Results
After processing certain tasks or upon specific triggers, the backend can broadcast the results. This could be a processed data set, a calculated result, or any other output that needs to be shared.
#### Events
Similar to broadcasting results but oriented explicitly towards events. This could include user actions, system events, or notifications that other components should be aware of.
- **Notify:**
Upon the conclusion of specific tasks or when particular triggers are activated, the system uses the Notify action to send out notifications from the **Results**. The Notify action is the conduit through which results are broadcasted asynchronously, whether they concern task completions, errors, updates, or any processed data set.
- **Listen:**
Here, the BackEnd actively waits for incoming data or events. It is geared towards capturing inputs from users or updates from plugins.
#### Plugins
These are modular components or extensions designed to enhance the application's functionalities. Each plugin possesses a "Listen" action, enabling it to stand by for requests emanating from user inputs.
### Flow
1. Input is provided by the user or an external source.
2. This input is broadcasted as an event into the **Broadcast event**.
3. The **BackEnd** processes the event. Depending on the event, it might interact with one or several Plugins.
4. Once processed, **Broadcast result** can be sent out asynchronously through multiple notifications via Notify action.
## Jan workflow
![Workflow](../img/arch-flow.drawio.png)
### Overview
The architecture of the Jan desktop application is structured into four primary modules: "Prompt Template," "Language Model," "Output Parser," and "Apps." Let's break down each module and understand its components.
### Prompt Template
This is where predefined templates are stored. It sets the format and structure for user interactions. It contains:
- **Character's definition:**
Definitions of various characters or entities that may be interacted with or invoked during user requests (e.g., name, personality, and communication style).
- **Model's definition:**
Definitions related to the different language models (e.g., objectives, capabilities, and constraints)
- **Examples:**
Sample inputs and outputs for guidance. If given good examples, LLM could enable basic reasoning or planning skills.
- **Input:**
The actual user query or request that is being processed.
### Large Language Model
This processes the input provided.
- **Local models:**
These are the downloaded models that reside within the system. They can process requests without the need to connect to external resources.
- **OpenAI:**
This will connect you with OpenAI API, allowing the application to utilize powerful models like GPT-3.5 and GPT-4.
:::info
To use OpenAI models, you must have an OpenAI account and secret key. You can get your [OpenAI key](https://platform.openai.com/account/api-keys) here.
:::
- **Custom Agents:**
These are user-defined or third-party models that can be integrated into the Jan system for specific tasks.
### Output Parser
Language models produce textual content. However, often, there's a need for more organized data instead of plain text. This is achieved using output parsers.
- **Parser:**
This component ensures that the output conforms to the desired structure and format, removing unwanted information or errors.
### Apps
This represents applications or extensions that can be integrated with Jan.
- **Characters:** Characters or entities that can be utilized within the applications.
- **Models:** Different Large Language Models, Large Multimodal Models, and Stable Diffusion models that the apps might use.
- **RAG:** Represents a "Retrieval Augmented Generation" functionality, which helps in fetching relevant data and generating responses based on it.
## Jan Platform
![Platform](../img/arch-connection.drawio.png)
### Overview
The architecture of Jan can be thought of as a layered system, comprising of the FrontEnd, Middleware, and BackEnd. Each layer has distinct components and responsibilities, ensuring a modular and scalable approach.
#### FrontEnd
The **FrontEnd** is the visible part of Jan that interacts directly with the user.
- **Controller:** This is the main control unit of the FrontEnd. It processes the user's inputs, handles UI events, and communicates with other layers to fetch or send data.
- **Apps:** This represents applications or extensions that can be integrated with Jan.
- **Execute Request** act as the initial triggers to initiate processes within the application.
#### Middleware
It's a bridge between the FrontEnd and BackEnd. It's responsible for translating requests and ensuring smooth data flow.
- **SDK:** Stands for Software Development Kit. It provides a set of tools, libraries, and guidelines for developers to build and integrate custom applications or features with Jan.
- **Core:** It's tasked with managing the connections between the FrontEnd and BackEnd. It ensures data is routed correctly and efficiently between the two.
- **Local Native:** Refers to the connectors that enable communication with local services or applications. This will use your own hardware to ddeploy models.
- **Cloud Native:** As the name suggests, these connectors are tailored for cloud-based interactions, allowing Jan to leverage cloud services or interact with other cloud-based applications.
:::info
The Middleware communicates with the BackEnd primarily through **IPC** for Local and **Http** for Cloud.
:::
#### BackEnd
It is responsible for data processing, storage, and other core functionalities.
- **Plugins:** Extendable modules that can be added to the Jan system to provide additional functionalities or integrations with third-party applications.
- **Nitro:** This is a high-performance engine or a set of services that power specific functionalities within Jan. Given its placement in the architecture, it's reasonable to assume that Nitro provides acceleration or optimization capabilities for tasks.

View File

@ -1,5 +1,4 @@
---
sidebar_position: 1
title: Build an app
---
@ -75,7 +74,7 @@ The [`src/`](./src/) directory is the heart of your app! You can replace the con
- `index.ts` is your app's mainentrypoint. You can access the Web runtime and define UI in this file.
- `module.ts` is your Node runtime in which functions get executed. You should define core logic and compute-intensive workloads in this file.
- `index.ts` and `module.ts` interact with each other via RPC (See [Information flow](./app-anatomy.md#information-flow)) via [`invokePluginFunc`](../reference/01_init.md#invokepluginfunc)
- `index.ts` and `module.ts` interact with each other via RPC (See [Information flow](./app-anatomy.md#information-flow)) via [`invokePluginFunc`](../../reference/01_init.md#invokepluginfunc)
Import the Jan SDK
@ -145,7 +144,7 @@ module.exports = {
## App installation
![Manual installation](img/build-app-1.png)
![Manual installation](../img/build-app-1.png)
- `Select` the built `*.tar.gz` file
- Jan will reload after new apps get installed

View File

@ -1,5 +1,4 @@
---
sidebar_position: 3
title: Publishing an app
---

View File

@ -0,0 +1,16 @@
---
title: Overview
---
Jan's mission is to power the next-gen App with limitless extensibility by providing users with the following:
- Unified API/ Helpers so that they only need to care about what matters.
- Wide range of Optimized and state-of-the-art models that can help your App with Thinking/ Hearing/ Seeing capabilities. This is powered by our [Nitro](https://github.com/janhq/nitro).
- Strong support for the App marketplace and Model marketplace that streamline value from end customers to builders at all layers.
- The most important thing is: The users of Jan can use the Apps via UI and API for integration.
At Jan, we strongly believe in `Portable AI` and `Personal AI` that is created once and run anywhere.
## Downloads
[Jan.ai](https://jan.ai/) - Desktop app
[Jan Github](https://github.com/janhq/jan) - Opensource library for developers

View File

Before

Width:  |  Height:  |  Size: 11 KiB

After

Width:  |  Height:  |  Size: 11 KiB

View File

Before

Width:  |  Height:  |  Size: 17 KiB

After

Width:  |  Height:  |  Size: 17 KiB

View File

Before

Width:  |  Height:  |  Size: 15 KiB

After

Width:  |  Height:  |  Size: 15 KiB

View File

Before

Width:  |  Height:  |  Size: 24 KiB

After

Width:  |  Height:  |  Size: 24 KiB

View File

Before

Width:  |  Height:  |  Size: 12 KiB

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 75 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

View File

Before

Width:  |  Height:  |  Size: 47 KiB

After

Width:  |  Height:  |  Size: 47 KiB

View File

Before

Width:  |  Height:  |  Size: 54 KiB

After

Width:  |  Height:  |  Size: 54 KiB

View File

Before

Width:  |  Height:  |  Size: 778 KiB

After

Width:  |  Height:  |  Size: 778 KiB

View File

Before

Width:  |  Height:  |  Size: 232 KiB

After

Width:  |  Height:  |  Size: 232 KiB

View File

Before

Width:  |  Height:  |  Size: 158 KiB

After

Width:  |  Height:  |  Size: 158 KiB

View File

@ -1,62 +0,0 @@
---
sidebar_position: 2
title: Anatomy of an app
---
### High level architecture
![High level architecture](img/architecture-0.drawio.png)
Jan mission is to power the next gen App with the limitless extensibility by providing BUILDER:
- Unified API/ Helpers so that they only need to care about what matters
- Wide range of Optimized and State of the art models that can help your App with Thinking/ Hearing/ Seeing capabilities. This is powered by our [Nitro](https://github.com/janhq/nitro)
- Strong support for App marketplace and Model market place that streamline value from end customers to builders at all layers
- The most important: The user of Jan can use the Apps via UI and API for integration
At Jan, we strongly believe in `portable AI` that is created once and run anywhere.
### Low level architecture
![Low level architecture](img/architecture-1.drawio.png)
- Jan platform includes the following components:
- Processes:
- UI process:
- This is Electron framework `renderer` component (Web technology equivalent)
- Jan provides core platform UI that:
- Allows App to `register()` function blueprint with name and arguments
- Run `execute()` registered App functions
- Node process (NodeJS technology equivalent)
- This is Electron framework `main process` component (NodeJS runtime)
- Jan provides core platform UI that:
- Allows App to `register()` function blueprint with name and arguments
- Run `execute()` registered App functions
- `@janhq/core` library that exposes Core API for App to reuse. Currently it only supports App `index.ts`
- Vertically, there are `Platform Core` component and `App` component. Each of those includes UI and Node process that work in pair.
## Events
![Platform events](img/app-anatomy-4.drawio.jpg)
#### onLaunch()
![Platform onLaunch()](img/app-anatomy-1.drawio.jpg)
#### onStart()
![Platform onStart()](img/app-anatomy-2.drawio.jpg)
#### onDispose()
![Platform onDispose()](img/app-anatomy-3.drawio.jpg)
#### onAppInstallation() and onAppUninstallation()
The Platform simply restarts and trigger onDispose() then onLaunch().
### Information flow
- When App is being used, here is how the information passes between Platform and Apps
![Communication](img/app-anatomy-5.drawio.png)

View File

@ -1,11 +0,0 @@
---
title: Concepts
---
## Concepts
- Jan Platform: Desktop app/ Cloud native SaaS that can run on Linux, Windows, Mac or even Server that comes with extensibilities, toolbox and state of the art but optimized models for next gen App.
- Jan App: Next gen App built on Jan Plaform as `portable intelligence` that can be run everywhere.
- Models:
- LLM models
- Other models

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 86 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 186 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 85 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

View File

@ -1,8 +1,9 @@
---
sidebar_position: 5
title: Cloud Native
---
# Installing Jan Cloud Native
Cloud Native is useful when you want to deploy Jan to a shared/remote/cloud server, rather than running it as a local Desktop app.
> This is an experimental feature - expect breaking changes!
@ -35,7 +36,7 @@ Open your browser at [http://localhost:4000](http://localhost:4000)
### Architecture
![cloudnative](img/cloudnative.png)
![cloudnative](../../developers/img/cloudnative.png)
### TODOs

View File

@ -0,0 +1,3 @@
---
title: Installation
---

View File

@ -0,0 +1,77 @@
---
title: Linux
---
# Installing Jan on Linux
## Installation
### Step 1: Download the Installer
To begin using 👋Jan.ai on your Windows computer, follow these steps:
1. Visit [Jan.ai](https://jan.ai/).
2. Click on the "Download for Windows" button to download the Jan Installer.
![Jan Installer](../img/jan-download.png)
:::tip
For faster results, you should enable your NVIDIA GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager or from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads).
:::
```bash
apt install nvidia-cuda-toolkit
```
Check the installation by
```bash
nvidia-smi
```
:::tip
For AMD GPU. You can download it from your Linux distro's package manager or from here: [ROCm Quick Start (Linux)](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html).
:::
### Step 2: Download your first model
Now, let's get your first model:
1. After installation, you'll find the 👋Jan application icon on your desktop. Double-click to open it.
2. Welcome to the Jan homepage. Click on "Explore Models" to see the Model catalog.
![Explore models](../img/explore-model.png)
3. You can also see different quantized versions by clicking on "Show Available Versions."
![Model versions](../img/model-version.png)
> Note: Choose a model that matches your computer's memory and RAM.
4. Select your preferred model and click "Download."
![Downloading](../img/downloading.png)
### Step 3: Start the model
Once your model is downloaded. Go to "My Models" and then click "Start Model."
![Start model](../img/start-model.png)
### Step 4: Start the conversations
Now you're ready to start using 👋Jan.ai for conversations:
Click "Chat" and begin your first conversation by selecting "New conversation."
You can also check the CPU and Memory usage of the computer.
![Chat](../img/chat.png)
That's it! Enjoy using Large Language Models (LLMs) with 👋Jan.ai.
## Uninstallation
## Troubleshooting

View File

@ -0,0 +1,52 @@
---
title: Mac
---
## Installation
### Step 1: Download the Installer
To begin using 👋Jan.ai on your Windows computer, follow these steps:
1. Visit [Jan.ai](https://jan.ai/).
2. Click on the "Download for Windows" button to download the Jan Installer.
![Jan Installer](../img/jan-download.png)
### Step 2: Download your first model
Now, let's get your first model:
1. After installation, you'll find the 👋Jan application icon on your desktop. Open it.
2. Welcome to the Jan homepage. Click on "Explore Models" to see the Model catalog.
![Explore models](../img/explore-model.png)
3. You can also see different quantized versions by clicking on "Show Available Versions."
![Model versions](../img/model-version.png)
> Note: Choose a model that matches your computer's memory and RAM.
4. Select your preferred model and click "Download."
![Downloading](../img/downloading.png)
### Step 3: Start the model
Once your model is downloaded. Go to "My Models" and then click "Start Model."
![Start model](../img/start-model.png)
### Step 4: Start the conversations
Now you're ready to start using 👋Jan.ai for conversations:
Click "Chat" and begin your first conversation by selecting "New conversation."
You can also check the CPU and Memory usage of the computer.
![Chat](../img/chat.png)
That's it! Enjoy using Large Language Models (LLMs) with 👋Jan.ai.
## Uninstallation
## Troubleshooting

View File

@ -0,0 +1,77 @@
---
title: Windows
---
# Installing Jan on Windows
## Step 1: Download the Installer
To begin using 👋Jan.ai on your Windows computer, follow these steps:
1. Visit [Jan.ai](https://jan.ai/).
2. Click on the "Download for Windows" button to download the Jan Installer.
![Jan Installer](../img/jan-download.png)
## Step 2: Proceed the Windows Defender
When you run the Jan Installer, Windows Defender may display a warning. Here's what to do:
1. Click "Run away" to accept and install 👋Jan.ai.
![Accept Jan](../img/window-defender.png)
1. Wait for the 👋Jan.ai installation to complete.
![Setting up](../img/set-up.png)
:::tip
For faster results, you should enable your NVIDIA GPU. Make sure to have the CUDA toolkit installed. You can download it from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) or [CUDA Installation guide](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html#verify-you-have-a-cuda-capable-gpu).
:::
Check the installation by
```bash
nvidia-smi
```
:::tip
For AMD GPU, you should use [WSLv2](https://learn.microsoft.com/en-us/windows/wsl/install). You can download it from here: [ROCm Quick Start (Linux)](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html).
:::
## Step 3: Download your first model
Now, let's get your first model:
1. After installation, you'll find the 👋Jan application icon on your desktop. Double-click to open it.
2. Welcome to the Jan homepage. Click on "Explore Models" to see the Model catalog.
![Explore models](../img/explore-model.png)
1. You can also see different quantized versions by clicking on "Show Available Versions."
![Model versions](../img/model-version.png)
> Note: Choose a model that matches your computer's memory and RAM.
1. Select your preferred model and click "Download."
![Downloading](../img/downloading.png)
## Step 4: Start the model
Once your model is downloaded. Go to "My Models" and then click "Start Model."
![Start model](../img/start-model.png)
## Step 5: Start the conversations
Now you're ready to start using 👋Jan.ai for conversations:
Click "Chat" and begin your first conversation by selecting "New conversation."
You can also check the CPU and Memory usage of the computer.
![Chat](../img/chat.png)
That's it! Enjoy using Large Language Models (LLMs) with 👋Jan.ai.

View File

@ -1,3 +0,0 @@
---
title: Installing Jan on Linux
---

View File

@ -1,3 +0,0 @@
---
title: Installing Jan on Mac
---

View File

@ -0,0 +1,9 @@
---
title: Overview
slug: /guides
---
- Jan Platform: Desktop app/ Cloud native SaaS that can run on Linux, Windows, Mac, or even a Server that comes with extensibilities, toolbox, and state-of-the-art but optimized models for next-gen Apps.
- Jan App: Next-gen App built on Jan Plaform as `portable intelligence` that can be run everywhere.
- Models:
- Large Language Models
- Stable Diffusion models

View File

@ -1,3 +1,34 @@
---
title: Troubleshooting
sidebar_position: 5
---
# Jan.ai Troubleshooting Guide
Please note that 👋Jan is in "development mode," and you might encounter issues. If you need to reset your installation, follow these steps:
## Issue 1: Broken Build
1. Delete the Jan Application from your computer.
2. Clear the cache by running one of the following commands:
```sh
rm -rf /Users/$(whoami)/Library/Application\ Support/jan-electron
```
or
```sh
rm -rf /Users/$(whoami)/Library/Application\ Support/jan
```
3. If the above steps fail, use the following commands to find and kill any problematic processes:
```sh
ps aux | grep nitro
```
Look for processes like "nitro" and "nitro_arm_64," and kill them one by one with:
```sh
kill -9 <PID>
```

View File

@ -1,3 +0,0 @@
---
title: Installing Jan on Windows
---

View File

@ -0,0 +1,120 @@
---
title: Engineering
---
## Connecting to Rigs
### Pritunl Setup
1. **Install Pritunl**: [Download here](https://client.pritunl.com/#install)
2. **Import .ovpn file**
3. **VSCode**: Install the "Remote-SSH" extension for connection
### Llama.cpp Setup
1. **Clone Repo**: `git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp`
2. **Build**:
```bash
mkdir build && cd build
cmake .. -DLLAMA_CUBLAS=ON -DLLAMA_CUDA_F16=ON -DLLAMA_CUDA_MMV_Y=8
cmake --build . --config Release
```
3. **Download Model:**
```bash
cd ../models && wget https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q8_0.gguf
```
4. **Run:**
```bash
cd ../build/bin/
./main -m ./models/llama-2-7b.Q8_0.gguf -p "Writing a thesis proposal can be done in 10 simple steps:\nStep 1:" -n 2048 -e -ngl 100 -t 48
```
For the llama.cpp CLI arguments you can see here:
| Short Option | Long Option | Param Value | Description |
|--------------|-----------------------|-------------|-------------|
| `-h` | `--help` | | Show this help message and exit |
| `-i` | `--interactive` | | Run in interactive mode |
| | `--interactive-first` | | Run in interactive mode and wait for input right away |
| | `-ins`, `--instruct` | | Run in instruction mode (use with Alpaca models) |
| `-r` | `--reverse-prompt` | `PROMPT` | Run in interactive mode and poll user input upon seeing `PROMPT` |
| | `--color` | | Colorise output to distinguish prompt and user input from |
|**Generations**|
| `-s` | `--seed` | `SEED` | Seed for random number generator |
| `-t` | `--threads` | `N` | Number of threads to use during computation |
| `-p` | `--prompt` | `PROMPT` | Prompt to start generation with |
| | `--random-prompt` | | Start with a randomized prompt |
| | `--in-prefix` | `STRING` | String to prefix user inputs with |
| `-f` | `--file` | `FNAME` | Prompt file to start generation |
| `-n` | `--n_predict` | `N` | Number of tokens to predict |
| | `--top_k` | `N` | Top-k sampling |
| | `--top_p` | `N` | Top-p sampling |
| | `--repeat_last_n` | `N` | Last n tokens to consider for penalize |
| | `--repeat_penalty` | `N` | Penalize repeat sequence of tokens |
| `-c` | `--ctx_size` | `N` | Size of the prompt context |
| | `--ignore-eos` | | Ignore end of stream token and continue generating |
| | `--memory_f32` | | Use `f32` instead of `f16` for memory key+value |
| | `--temp` | `N` | Temperature |
| | `--n_parts` | `N` | Number of model parts |
| `-b` | `--batch_size` | `N` | Batch size for prompt processing |
| | `--perplexity` | | Compute perplexity over the prompt |
| | `--keep` | | Number of tokens to keep from the initial prompt |
| | `--mlock` | | Force system to keep model in RAM |
| | `--mtest` | | Determine the maximum memory usage |
| | `--verbose-prompt` | | Print prompt before generation |
| `-m` | `--model` | `FNAME` | Model path |
### TensorRT-LLM Setup
#### **Docker and TensorRT-LLM build**
> Note: You should run with admin permission to make sure everything works fine
1. **Docker Image:**
```bash
sudo make -C docker build
```
2. **Run Container:**
```bash
sudo make -C docker run
```
Once in the container, TensorRT-LLM can be built from the source using the following:
3. **Build:**
```bash
# To build the TensorRT-LLM code.
python3 ./scripts/build_wheel.py --trt_root /usr/local/tensorrt
# Deploy TensorRT-LLM in your environment.
pip install ./build/tensorrt_llm*.whl
```
> Note: You can specify the GPU architecture (e.g. for 4090 is ADA) for compilation time reduction
> The list of supported architectures can be found in the `CMakeLists.txt` file.
```bash
python3 ./scripts/build_wheel.py --cuda_architectures "89-real;90-real"
```
#### Running TensorRT-LLM
1. **Requirements:**
```bash
pip install -r examples/bloom/requirements.txt && git lfs install
```
2. **Download Weights:**
```bash
cd examples/llama && rm -rf ./llama/7B && mkdir -p ./llama/7B && git clone https://huggingface.co/NousResearch/Llama-2-7b-hf ./llama/7B
```
3. **Build Engine:**
```bash
python build.py --model_dir ./llama/7B/ --dtype float16 --remove_input_padding --use_gpt_attention_plugin float16 --enable_context_fmha --use_gemm_plugin float16 --use_weight_only --output_dir ./llama/7B/trt_engines/weight_only/1-gpu/
```
4. Run Inference:
```bash
python3 run.py --max_output_len=2048 --tokenizer_dir ./llama/7B/ --engine_dir=./llama/7B/trt_engines/weight_only/1-gpu/ --input_text "Writing a thesis proposal can be done in 10 simple steps:\nStep 1:"
```
For the tensorRT-LLM CLI arguments, you can see in the `run.py`.

View File

@ -1,4 +1,7 @@
---
title: Company Handbook
slug: /handbook
---
---
## Remote Work

View File

@ -1,3 +0,0 @@
---
title: How We Work
---

View File

@ -1,6 +1,6 @@
---
sidebar_position: 4
title: Nitro (C++ Inference Engine)
title: Nitro
slug: /nitro
---
Nitro, is the inference engine that powers Jan. Nitro is written in C++, optimized for edge deployment.
@ -73,4 +73,4 @@ curl -X POST 'http://localhost:3928/inferences/llamacpp/loadmodel' \
## Architecture diagram
![Nitro Architecture](img/architecture.png)
![Nitro Architecture](../developers/img/architecture.png)

View File

@ -118,30 +118,24 @@ const config = {
},
items: [
// Navbar left
// {
// type: "docSidebar",
// sidebarId: "featuresSidebar",
// position: "left",
// label: "Platform",
// },
{
type: "docSidebar",
sidebarId: "companySidebar",
position: "left",
label: "Company",
},
// Navbar right
{
type: "docSidebar",
sidebarId: "guidesSidebar",
position: "right",
label: "User Guides",
position: "left",
label: "User Guide",
},
{
type: "docSidebar",
sidebarId: "devSidebar",
position: "left",
label: "Developers",
},
// Navbar right
{
type: "docSidebar",
sidebarId: "aboutSidebar",
position: "right",
label: "Developer",
label: "About",
},
],
},

View File

@ -17,6 +17,7 @@
"@docusaurus/core": "^2.4.3",
"@docusaurus/preset-classic": "^2.4.3",
"@docusaurus/theme-live-codeblock": "^2.4.3",
"@docusaurus/theme-mermaid": "^3.0.0",
"@headlessui/react": "^1.7.17",
"@heroicons/react": "^2.0.18",
"@mdx-js/react": "^1.6.22",

View File

@ -29,41 +29,56 @@ const sidebars = {
},
],
// Autogenerate docs from /guides
guidesSidebar: [
"guides/overview",
{
type: "category",
label: "Guides",
label: "Installation",
collapsible: true,
collapsed: true,
link: { type: "doc", id: "guides/install/install" },
items: [
"guides/install/linux",
"guides/install/windows",
"guides/install/mac",
"guides/install/cloud-native",
],
},
"guides/troubleshooting",
],
devSidebar: [
"developers/developers",
"nitro/nitro",
{
type: "category",
label: "Apps",
collapsible: true,
collapsed: true,
items: [
{
type: "autogenerated",
dirName: "guides",
dirName: "developers/apps",
},
],
},
],
//
devSidebar: [
{
type: "category",
label: "Getting Started",
label: "Plugins",
collapsible: true,
collapsed: false,
collapsed: true,
items: [
{
type: "autogenerated",
dirName: "getting-started",
dirName: "developers/plugins",
},
],
},
{
type: "category",
label: "Reference",
label: "API Reference",
collapsible: true,
collapsed: false,
collapsed: true,
items: [
{
type: "autogenerated",
@ -71,61 +86,19 @@ const sidebars = {
},
],
},
{
type: "category",
label: "Apps (Plugins)",
collapsible: true,
collapsed: true,
items: [
{
type: "autogenerated",
dirName: "apps",
},
],
},
// {
// type: "category",
// label: "Tutorials",
// collapsible: true,
// collapsed: false,
// items: [
// {
// type: "autogenerated",
// dirName: "tutorials",
// },
// ],
// },
// {
// type: "category",
// label: "Articles",
// collapsible: true,
// collapsed: false,
// items: [
// {
// type: "doc",
// label: "Nitro",
// id: "articles/nitro",
// },
// ],
// },
],
companySidebar: [
// {
// type: "category",
// label: "About Jan",
// collapsible: true,
// collapsed: true,
// link: { type: "doc", id: "about/about" },
// items: [
// "about/team",
// {
// type: "link",
// label: "Careers",
// href: "https://janai.bamboohr.com/careers",
// },
// ],
// },
aboutSidebar: [
{
type: "doc",
label: "About Jan",
id: "about/about",
},
{
type: "link",
label: "Careers",
href: "https://janai.bamboohr.com/careers",
},
{
type: "category",
label: "Events",
@ -139,14 +112,20 @@ const sidebars = {
},
],
},
// {
// type: "category",
// label: "Company Handbook",
// collapsible: true,
// collapsed: true,
// link: { type: "doc", id: "handbook/handbook" },
// items: ["handbook/remote-work"],
// },
{
type: "category",
label: "Company Handbook",
collapsible: true,
collapsed: true,
link: { type: "doc", id: "handbook/handbook" },
items: [
{
type: "doc",
label: "Engineering",
id: "handbook/engineering/engineering",
},
],
},
],
};

View File

@ -0,0 +1,5 @@
.custom-toc-title {
font-weight: bold;
margin-bottom: 16px;
margin-top: -20px;
}

View File

@ -0,0 +1,9 @@
document.addEventListener('DOMContentLoaded', function () {
const toc = document.querySelector('.table-of-contents');
if (toc) {
const title = document.createElement('div');
title.className = 'custom-toc-title';
title.innerText = 'On this page';
toc.insertBefore(title, toc.firstChild);
}
});

File diff suppressed because it is too large Load Diff