Partial restructure of Models Spec

2023-11-19 04:13:12 +08:00 · 2023-11-19 04:13:12 +08:00 · b321427b34
commit b321427b34
parent 17258192e8
6 changed files with 237 additions and 139 deletions
--- a/docs/docs/intro/how-jan-works.md
+++ b/docs/docs/intro/how-jan-works.md
@ -1,3 +1,11 @@
 ---
 title: How Jan Works
---
+---
+
+- Local Filesystem
+Follow-on from Quickstart to show how things actually worked
+Write in a conversational style, show how things work under the hood
+Check how filesystem changed after each request
+- Model loading into RAM/VRAM
+Explain how the .bin file is loaded via Llama.cpp
+Explain how it consumes RAM and VRAM, and refer to system monitor
--- a/docs/docs/specs/fine-tuning.md
+++ b/docs/docs/specs/fine-tuning.md
@ -1,4 +1,4 @@
 ---
-title: "Fine tuning"
+title: "Fine-tuning"
 ---
 Todo: @hiro
--- a/docs/docs/specs/models.md
+++ b/docs/docs/specs/models.md
@ -10,56 +10,103 @@ Feedback: [HackMD: Models Spec](https://hackmd.io/ulO3uB1AQCqLa5SAAMFOQw)

 :::

-Models are AI models like Llama and Mistral
+## Overview

-> OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models
+Jan's Model API aims to be as similar as possible to [OpenAI's Models API](https://platform.openai.com/docs/api-reference/models), with additional methods for managing and running models locally. 

-## User Stories
+### User Objectives

-_Users can download a model via a web URL_
+- Users can start/stop models and use them in a thread (or via Chat Completions API)
+- Users can download, import and delete models  
+- User can configure model settings at the model level or override it at thread-level
+- Users can use remote models (e.g. OpenAI, OpenRouter)

- Wireframes here
+## Models Folder

-_Users can import a model from local directory_
+Models in Jan are stored in the `/models` folder. 

- Wireframes here
+ `<model-name>.json` files. 

-_Users can configure model settings, like run parameters_
+- Everything needed to represent a `model` is packaged into an `Model folder`.
+- The `folder` is standalone and can be easily zipped, imported, and exported, e.g. to Github.
+- The `folder` always contains at least one `Model Object`, declared in a `json` format.
+- The `folder` and `file` do not have to share the same name
+- The model `id` is made up of `folder_name/filename` and is thus always unique.

- Wireframes here
+```sh
+/janroot
+    /models
+        azure-openai/                       # Folder name
+            azure-openai-gpt3-5.json        # File name

-_Users can override run settings at runtime_
+        llama2-70b/
+            model.json
+            .gguf
+```

- See Assistant Spec and Thread
+## Model Object

-## Jan Model Object
+Models in Jan are represented as `json` objects, and are colloquially known as `model.jsons`. 

- A `Jan Model Object` is a “representation" of a model
- Objects are defined by `model-name.json` files in `json` format
- Objects are identified by `folder-name/model-name`, where its `id` is indicative of its file location.
- Objects are designed to be compatible with `OpenAI Model Objects`, with additional properties needed to run on our infrastructure.
- ALL object properties are optional, i.e. users should be able to run a model declared by an empty `json` file.
+Jan's models follow a `<model_name>.json` naming convention. 
+
+Jan's `model.json` aims for rough equivalence with [OpenAI's Model Object](https://platform.openai.com/docs/api-reference/models/object), and add additional properties to support local models.  
+
+Jan's `model.json` object properties are optional, i.e. users should be able to run a model declared by an empty `json` file.
+
+```json
+// ./models/zephr/zephyr-7b-beta-Q4_K_M.json
+{
+    "source_url": "https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/blob/main/zephyr-7b-beta.Q4_K_M.gguf",
+    "parameters": {
+        "init": {
+            "ctx_len": "2048",
+            "ngl": "100",
+            "embedding": "true",
+            "n_parallel": "4",
+            "pre_prompt": "A chat between a curious user and an artificial intelligence",
+            "user_prompt": "USER: ",
+            "ai_prompt": "ASSISTANT: "
+        },
+        "runtime": {
+            "temperature": "0.7",
+            "token_limit": "2048",
+            "top_k": "0",
+            "top_p": "1",
+            "stream": "true"
+        }
+    },
+    "metadata": {
+        "engine": "llamacpp",
+        "quantization": "Q4_K_M",
+        "size": "7B",
+    }
+}
+```

 | Property                | Type                                                          | Description                                                               | Validation                                       |
 | ----------------------- | ------------------------------------------------------------- | ------------------------------------------------------------------------- | ------------------------------------------------ |
-| `source_url`            | string                                                        | The model download source. It can be an external url or a local filepath. | Defaults to `pwd`. See [Source_url](#Source_url) |
 | `object`                | enum: `model`, `assistant`, `thread`, `message`               | Type of the Jan Object. Always `model`                                    | Defaults to "model"                              |
-| `name`                  | string                                                        | A vanity name                                                             | Defaults to filename                             |
-| `description`           | string                                                        | A vanity description of the model                                         | Defaults to ""                                   |
-| `state`                 | enum[`to_download` , `downloading`, `ready` , `running`] | Needs more thought                                                        | Defaults to `to_download`                     |
+| `source_url`            | string                                                        | The model download source. It can be an external url or a local filepath. | Defaults to `pwd`. See [Source_url](#Source_url) |
 | `parameters`            | map                                                           | Defines default model run parameters used by any assistant.               | Defaults to `{}`                                 |
+| `description`           | string                                                        | A vanity description of the model                                         | Defaults to ""                                   |
 | `metadata`              | map                                                           | Stores additional structured information about the model.                 | Defaults to `{}`                                 |
 | `metadata.engine`       | enum: `llamacpp`, `api`, `tensorrt`                           | The model backend used to run model.                                      | Defaults to "llamacpp"                           |
 | `metadata.quantization` | string                                                        | Supported formats only                                                    | See [Custom importers](#Custom-importers)        |
 | `metadata.binaries`     | array                                                         | Supported formats only.                                                   | See [Custom importers](#Custom-importers)        |
+| `state`                 | enum[`to_download` , `downloading`, `ready` , `running`] | Needs more thought                                                        | Defaults to `to_download`                     |
+| `name`                  | string                                                        | A vanity name                                                             | Defaults to filename                             |

-### Source_url
+### Model Source
+
+There are 3 types of model sources
+
+- Local model
+- Remote source
+- Cloud API

 - Users can download models from a `remote` source or reference an existing `local` model.
 - If this property is not specified in the Model Object file, then the default behavior is to look in the current directory.
-
-#### Local source_url
-
 - Users can import a local model by providing the filepath to the model

 ```json
@ -70,14 +117,36 @@ _Users can override run settings at runtime_
 "source_url": "./",
 ```

-#### Remote source_url
-
 - Users can download a model by remote URL.
 - Supported url formats:
  - `https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/blob/main/llama-2-7b-chat.Q3_K_L.gguf`
  - `https://any-source.com/.../model-binary.bin`

-#### Custom importers
+- Using a remote API to access model `model-azure-openai-gpt4-turbo.json`
+- See [source](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython&pivots=rest-api)
+
+```json
+"source_url": "https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo",
+"parameters": {
+  "init" {
+    "API-KEY": "",
+    "DEPLOYMENT-NAME": "",
+    "api-version": "2023-05-15"
+  },
+  "runtime": {
+    "temperature": "0.7",
+    "max_tokens": "2048",
+    "presence_penalty": "0",
+    "top_p": "1",
+    "stream": "true"
+  }
+}
+"metadata": {
+    "engine": "api",
+}
+```
+
+### Model Formats

 Additionally, Jan supports importing popular formats. For example, if you provide a HuggingFace URL for a `TheBloke` model, Jan automatically downloads and catalogs all quantizations. Custom importers autofills properties like `metadata.quantization` and `metadata.size`.

@ -89,7 +158,8 @@ Supported URL formats with custom importers:
 - `azure_openai`: `https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo`
 - `openai`: `api.openai.com`

-### Generic Example
+<details>
+    <summary>Example: Zephyr 7B</summary>

 - Model has 1 binary `model-zephyr-7B.json`
 - See [source](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/)
@ -122,8 +192,9 @@ Supported URL formats with custom importers:
    "size": "7B",
 }
 ```
+</details>

-### Example: multiple binaries
+### Multiple binaries

 - Model has multiple binaries `model-llava-1.5-ggml.json`
 - See [source](https://huggingface.co/mys/ggml_llava-v1.5-13b)
@ -139,112 +210,24 @@ Supported URL formats with custom importers:
 }
 ```

-### Example: Azure API
+## Models API

- Using a remote API to access model `model-azure-openai-gpt4-turbo.json`
- See [source](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython&pivots=rest-api)
+### Get Model

-```json
-"source_url": "https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo",
-"parameters": {
-  "init" {
-    "API-KEY": "",
-    "DEPLOYMENT-NAME": "",
-    "api-version": "2023-05-15"
-  },
-  "runtime": {
-    "temperature": "0.7",
-    "max_tokens": "2048",
-    "presence_penalty": "0",
-    "top_p": "1",
-    "stream": "true"
-  }
-}
-"metadata": {
-    "engine": "api",
-}
-```
-
-## Filesystem
-
- Everything needed to represent a `model` is packaged into an `Model folder`.
- The `folder` is standalone and can be easily zipped, imported, and exported, e.g. to Github.
- The `folder` always contains at least one `Model Object`, declared in a `json` format.
- The `folder` and `file` do not have to share the same name
- The model `id` is made up of `folder_name/filename` and is thus always unique.
-
-```sh
-/janroot
-    /models
-        azure-openai/                       # Folder name
-            azure-openai-gpt3-5.json        # File name
-
-        llama2-70b/
-            model.json
-            .gguf
-```
-
-### Default ./model folder
- Jan ships with a default model folders containing recommended models
- Only the Model Object `json` files are included
- Users must later explicitly download the model binaries
-```sh
-models/
-    mistral-7b/
-        mistral-7b.json
-    hermes-7b/
-        hermes-7b.json
-```
-### Multiple quantizations
-
- Each quantization has its own `Jan Model Object` file
-
-```sh
-llama2-7b-gguf/
-    llama2-7b-gguf-Q2.json
-    llama2-7b-gguf-Q3_K_L.json
-    .bin
-```
-### Multiple model partitions
-
- A Model that is partitioned into several binaries use just 1 file
-
-```sh
-llava-ggml/
-    llava-ggml-Q5.json
-    .proj
-    ggml
-```
-### Your locally fine-tuned model
-
- ??
-
-```sh
-llama-70b-finetune/
-    llama-70b-finetune-q5.json
-    .bin
-```
-## Jan API
-### Model API Object
+- OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/retrieve
+- OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/object
 - The `Jan Model Object` maps into the `OpenAI Model Object`.
 - Properties marked with `*` are compatible with the [OpenAI `model` object](https://platform.openai.com/docs/api-reference/models)
 - Note: The `Jan Model Object` has additional properties when retrieved via its API endpoint.
-> OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/object

-### Model lifecycle
-Model has 4 states (enum)
- `to_download`
- `downloading`
- `ready`
- `running`
+#### Request

-### Get Model
-> OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/retrieve
- Example request
 ```shell
 curl {JAN_URL}/v1/models/{model_id}
 ```
- Example response
+
+#### Response
+
 ```json
 {
  "id": "model-zephyr-7B",
@ -273,14 +256,19 @@ curl {JAN_URL}/v1/models/{model_id}
  }
 }
 ```
+
 ### List models
 Lists the currently available models, and provides basic information about each one such as the owner and availability.
 > OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/list
- Example request
+
+#### Request
+
 ```shell=
 curl {JAN_URL}/v1/models
 ```
- Example response
+
+#### Response
+
 ```json
 {
  "object": "list",
@ -310,13 +298,18 @@ curl {JAN_URL}/v1/models
  "object": "list"
 }
 ```
+
 ### Delete Model
 > OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/delete
-`- Example request
+
+#### Request
+
 ```shell
 curl -X DELETE {JAN_URL}/v1/models/{model_id}
 ```
- Example response
+
+#### Response
+
 ```json
 {
  "id": "model-zephyr-7B",
@ -325,14 +318,19 @@ curl -X DELETE {JAN_URL}/v1/models/{model_id}
  "state": "to_download"
 }
 ```
+
 ### Start Model
 > Jan-only endpoint
 The request to start `model` by changing model state from `ready` to `running`
- Example request
+
+#### Request
+
 ```shell
 curl -X PUT {JAN_URL}/v1/models{model_id}/start
 ```
- Example response
+
+#### Response
+
 ```json
 {
  "id": "model-zephyr-7B",
@ -340,14 +338,19 @@ curl -X PUT {JAN_URL}/v1/models{model_id}/start
  "state": "running"
 }
 ```
+
 ### Stop Model
 > Jan-only endpoint
 The request to start `model` by changing model state from `running` to `ready`
- Example request
+
+#### Request
+
 ```shell
 curl -X PUT {JAN_URL}/v1/models/{model_id}/stop
 ```
- Example response
+
+#### Response
+
 ```json
 {
  "id": "model-zephyr-7B",
@ -355,18 +358,95 @@ curl -X PUT {JAN_URL}/v1/models/{model_id}/stop
  "state": "ready"
 }
 ```
+
 ### Download Model
 > Jan-only endpoint
 The request to download `model` by changing model state from `to_download` to `downloading` then `ready`once it's done.
- Example request
+
+#### Request
 ```shell
 curl -X POST {JAN_URL}/v1/models/
 ```
- Example response
+
+#### Response
 ```json
 {
  "id": "model-zephyr-7B",
  "object": "model",
  "state": "downloading"
 }
+```
+
+## Examples
+
+### Pre-loaded Models
+
+- Jan ships with a default model folders containing recommended models
+- Only the Model Object `json` files are included
+- Users must later explicitly download the model binaries
+- 
+```sh
+models/
+    mistral-7b/
+        mistral-7b.json
+    hermes-7b/
+        hermes-7b.json
+```
+
+### Azure OpenAI
+
+- Using a remote API to access model `model-azure-openai-gpt4-turbo.json`
+- See [source](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython&pivots=rest-api)
+
+```json
+"source_url": "https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo",
+"parameters": {
+  "init" {
+    "API-KEY": "",
+    "DEPLOYMENT-NAME": "",
+    "api-version": "2023-05-15"
+  },
+  "runtime": {
+    "temperature": "0.7",
+    "max_tokens": "2048",
+    "presence_penalty": "0",
+    "top_p": "1",
+    "stream": "true"
+  }
+}
+"metadata": {
+    "engine": "api",
+}
+```
+
+### Multiple quantizations
+
+- Each quantization has its own `Jan Model Object` file
+
+```sh
+llama2-7b-gguf/
+    llama2-7b-gguf-Q2.json
+    llama2-7b-gguf-Q3_K_L.json
+    .bin
+```
+
+### Multiple model partitions
+
+- A Model that is partitioned into several binaries use just 1 file
+
+```sh
+llava-ggml/
+    llava-ggml-Q5.json
+    .proj
+    ggml
+```
+
+### Your locally fine-tuned model
+
+- ??
+
+```sh
+llama-70b-finetune/
+    llama-70b-finetune-q5.json
+    .bin
 ```
--- a/docs/docs/specs/prompts.md
+++ b/docs/docs/specs/prompts.md
@ -0,0 +1,7 @@
+---
+title: Prompts
+---
+
+- [ ] /prompts folder
+- [ ] How to add to prompts
+- [ ] Assistants can have suggested Prompts
--- a/docs/docs/specs/settings.md
+++ b/docs/docs/specs/settings.md
@ -1,3 +1,5 @@
 ---
 title: Settings
---
+---
+
+- [ ] .jan folder in jan root
--- a/docs/sidebars.js
+++ b/docs/sidebars.js
@ -68,6 +68,7 @@ const sidebars = {
            "specs/jan",
            "specs/fine-tuning",
            "specs/settings",
+            "specs/prompts",
          ],
        },
      ],