Partial restructure of Models Spec

2023-11-19 04:13:12 +08:00 · 2023-11-19 04:13:12 +08:00 · b321427b34
commit b321427b34
parent 17258192e8
6 changed files with 237 additions and 139 deletions
--- a/docs/docs/intro/how-jan-works.md
+++ b/docs/docs/intro/how-jan-works.md
@ -1,3 +1,11 @@
 ---
 title: How Jan Works
 ---
 - Local Filesystem
 Follow-on from Quickstart to show how things actually worked
 Write in a conversational style, show how things work under the hood
 Check how filesystem changed after each request
 - Model loading into RAM/VRAM
 Explain how the .bin file is loaded via Llama.cpp
 Explain how it consumes RAM and VRAM, and refer to system monitor
--- a/docs/docs/specs/fine-tuning.md
+++ b/docs/docs/specs/fine-tuning.md
@ -1,4 +1,4 @@
 ---
-title: "Fine tuning"
+title: "Fine-tuning"
 ---
 Todo: @hiro
--- a/docs/docs/specs/models.md
+++ b/docs/docs/specs/models.md
@ -10,56 +10,103 @@ Feedback: [HackMD: Models Spec](https://hackmd.io/ulO3uB1AQCqLa5SAAMFOQw)
 :::
-Models are AI models like Llama and Mistral
+## Overview
-> OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models
+Jan's Model API aims to be as similar as possible to [OpenAI's Models API](https://platform.openai.com/docs/api-reference/models), with additional methods for managing and running models locally. 
-## User Stories
+### User Objectives
-_Users can download a model via a web URL_
+- Users can start/stop models and use them in a thread (or via Chat Completions API)
 - Users can download, import and delete models  
 - User can configure model settings at the model level or override it at thread-level
 - Users can use remote models (e.g. OpenAI, OpenRouter)
- Wireframes here
+## Models Folder
-_Users can import a model from local directory_
+Models in Jan are stored in the `/models` folder. 
- Wireframes here
+ `<model-name>.json` files. 
-_Users can configure model settings, like run parameters_
+- Everything needed to represent a `model` is packaged into an `Model folder`.
 - The `folder` is standalone and can be easily zipped, imported, and exported, e.g. to Github.
 - The `folder` always contains at least one `Model Object`, declared in a `json` format.
 - The `folder` and `file` do not have to share the same name
 - The model `id` is made up of `folder_name/filename` and is thus always unique.
- Wireframes here
+```sh
 /janroot
    /models
        azure-openai/                       # Folder name
            azure-openai-gpt3-5.json        # File name
-_Users can override run settings at runtime_
+        llama2-70b/
            model.json
            .gguf
 ```
- See Assistant Spec and Thread
+## Model Object
-## Jan Model Object
+Models in Jan are represented as `json` objects, and are colloquially known as `model.jsons`. 
- A `Jan Model Object` is a “representation" of a model
+Jan's models follow a `<model_name>.json` naming convention. 
- Objects are defined by `model-name.json` files in `json` format
+
- Objects are identified by `folder-name/model-name`, where its `id` is indicative of its file location.
+Jan's `model.json` aims for rough equivalence with [OpenAI's Model Object](https://platform.openai.com/docs/api-reference/models/object), and add additional properties to support local models.  
- Objects are designed to be compatible with `OpenAI Model Objects`, with additional properties needed to run on our infrastructure.
+
- ALL object properties are optional, i.e. users should be able to run a model declared by an empty `json` file.
+Jan's `model.json` object properties are optional, i.e. users should be able to run a model declared by an empty `json` file.
 ```json
 // ./models/zephr/zephyr-7b-beta-Q4_K_M.json
 {
    "source_url": "https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/blob/main/zephyr-7b-beta.Q4_K_M.gguf",
    "parameters": {
        "init": {
            "ctx_len": "2048",
            "ngl": "100",
            "embedding": "true",
            "n_parallel": "4",
            "pre_prompt": "A chat between a curious user and an artificial intelligence",
            "user_prompt": "USER: ",
            "ai_prompt": "ASSISTANT: "
        },
        "runtime": {
            "temperature": "0.7",
            "token_limit": "2048",
            "top_k": "0",
            "top_p": "1",
            "stream": "true"
        }
    },
    "metadata": {
        "engine": "llamacpp",
        "quantization": "Q4_K_M",
        "size": "7B",
    }
 }
 ```
 | Property                | Type                                                          | Description                                                               | Validation                                       |
 | ----------------------- | ------------------------------------------------------------- | ------------------------------------------------------------------------- | ------------------------------------------------ |
 | `source_url`            | string                                                        | The model download source. It can be an external url or a local filepath. | Defaults to `pwd`. See [Source_url](#Source_url) |
 | `object`                | enum: `model`, `assistant`, `thread`, `message`               | Type of the Jan Object. Always `model`                                    | Defaults to "model"                              |
-| `name`                  | string                                                        | A vanity name                                                             | Defaults to filename                             |
+| `source_url`            | string                                                        | The model download source. It can be an external url or a local filepath. | Defaults to `pwd`. See [Source_url](#Source_url) |
 | `description`           | string                                                        | A vanity description of the model                                         | Defaults to ""                                   |
 | `state`                 | enum[`to_download` , `downloading`, `ready` , `running`] | Needs more thought                                                        | Defaults to `to_download`                     |
 | `parameters`            | map                                                           | Defines default model run parameters used by any assistant.               | Defaults to `{}`                                 |
 | `description`           | string                                                        | A vanity description of the model                                         | Defaults to ""                                   |
 | `metadata`              | map                                                           | Stores additional structured information about the model.                 | Defaults to `{}`                                 |
 | `metadata.engine`       | enum: `llamacpp`, `api`, `tensorrt`                           | The model backend used to run model.                                      | Defaults to "llamacpp"                           |
 | `metadata.quantization` | string                                                        | Supported formats only                                                    | See [Custom importers](#Custom-importers)        |
 | `metadata.binaries`     | array                                                         | Supported formats only.                                                   | See [Custom importers](#Custom-importers)        |
 | `state`                 | enum[`to_download` , `downloading`, `ready` , `running`] | Needs more thought                                                        | Defaults to `to_download`                     |
 | `name`                  | string                                                        | A vanity name                                                             | Defaults to filename                             |
-### Source_url
+### Model Source
 There are 3 types of model sources
 - Local model
 - Remote source
 - Cloud API
 - Users can download models from a `remote` source or reference an existing `local` model.
 - If this property is not specified in the Model Object file, then the default behavior is to look in the current directory.
 #### Local source_url
 - Users can import a local model by providing the filepath to the model
 ```json
@ -70,14 +117,36 @@ _Users can override run settings at runtime_
 "source_url": "./",
 ```
 #### Remote source_url
 - Users can download a model by remote URL.
 - Supported url formats:
  - `https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/blob/main/llama-2-7b-chat.Q3_K_L.gguf`
  - `https://any-source.com/.../model-binary.bin`
-#### Custom importers
+- Using a remote API to access model `model-azure-openai-gpt4-turbo.json`
 - See [source](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython&pivots=rest-api)
 ```json
 "source_url": "https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo",
 "parameters": {
  "init" {
    "API-KEY": "",
    "DEPLOYMENT-NAME": "",
    "api-version": "2023-05-15"
  },
  "runtime": {
    "temperature": "0.7",
    "max_tokens": "2048",
    "presence_penalty": "0",
    "top_p": "1",
    "stream": "true"
  }
 }
 "metadata": {
    "engine": "api",
 }
 ```
 ### Model Formats
 Additionally, Jan supports importing popular formats. For example, if you provide a HuggingFace URL for a `TheBloke` model, Jan automatically downloads and catalogs all quantizations. Custom importers autofills properties like `metadata.quantization` and `metadata.size`.
@ -89,7 +158,8 @@ Supported URL formats with custom importers:
 - `azure_openai`: `https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo`
 - `openai`: `api.openai.com`
-### Generic Example
+<details>
    <summary>Example: Zephyr 7B</summary>
 - Model has 1 binary `model-zephyr-7B.json`
 - See [source](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/)
@ -122,8 +192,9 @@ Supported URL formats with custom importers:
    "size": "7B",
 }
 ```
 </details>
-### Example: multiple binaries
+### Multiple binaries
 - Model has multiple binaries `model-llava-1.5-ggml.json`
 - See [source](https://huggingface.co/mys/ggml_llava-v1.5-13b)
@ -139,112 +210,24 @@ Supported URL formats with custom importers:
 }
 ```
-### Example: Azure API
+## Models API
- Using a remote API to access model `model-azure-openai-gpt4-turbo.json`
+### Get Model
 - See [source](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython&pivots=rest-api)
-```json
+- OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/retrieve
-"source_url": "https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo",
+- OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/object
 "parameters": {
  "init" {
    "API-KEY": "",
    "DEPLOYMENT-NAME": "",
    "api-version": "2023-05-15"
  },
  "runtime": {
    "temperature": "0.7",
    "max_tokens": "2048",
    "presence_penalty": "0",
    "top_p": "1",
    "stream": "true"
  }
 }
 "metadata": {
    "engine": "api",
 }
 ```
 ## Filesystem
 - Everything needed to represent a `model` is packaged into an `Model folder`.
 - The `folder` is standalone and can be easily zipped, imported, and exported, e.g. to Github.
 - The `folder` always contains at least one `Model Object`, declared in a `json` format.
 - The `folder` and `file` do not have to share the same name
 - The model `id` is made up of `folder_name/filename` and is thus always unique.
 ```sh
 /janroot
    /models
        azure-openai/                       # Folder name
            azure-openai-gpt3-5.json        # File name
        llama2-70b/
            model.json
            .gguf
 ```
 ### Default ./model folder
 - Jan ships with a default model folders containing recommended models
 - Only the Model Object `json` files are included
 - Users must later explicitly download the model binaries
 ```sh
 models/
    mistral-7b/
        mistral-7b.json
    hermes-7b/
        hermes-7b.json
 ```
 ### Multiple quantizations
 - Each quantization has its own `Jan Model Object` file
 ```sh
 llama2-7b-gguf/
    llama2-7b-gguf-Q2.json
    llama2-7b-gguf-Q3_K_L.json
    .bin
 ```
 ### Multiple model partitions
 - A Model that is partitioned into several binaries use just 1 file
 ```sh
 llava-ggml/
    llava-ggml-Q5.json
    .proj
    ggml
 ```
 ### Your locally fine-tuned model
 - ??
 ```sh
 llama-70b-finetune/
    llama-70b-finetune-q5.json
    .bin
 ```
 ## Jan API
 ### Model API Object
 - The `Jan Model Object` maps into the `OpenAI Model Object`.
 - Properties marked with `*` are compatible with the [OpenAI `model` object](https://platform.openai.com/docs/api-reference/models)
 - Note: The `Jan Model Object` has additional properties when retrieved via its API endpoint.
 > OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/object
-### Model lifecycle
+#### Request
 Model has 4 states (enum)
 - `to_download`
 - `downloading`
 - `ready`
 - `running`
 ### Get Model
 > OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/retrieve
 - Example request
 ```shell
 curl {JAN_URL}/v1/models/{model_id}
 ```
- Example response
+
 #### Response
 ```json
 {
  "id": "model-zephyr-7B",
@ -273,14 +256,19 @@ curl {JAN_URL}/v1/models/{model_id}
  }
 }
 ```
 ### List models
 Lists the currently available models, and provides basic information about each one such as the owner and availability.
 > OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/list
- Example request
+
 #### Request
 ```shell=
 curl {JAN_URL}/v1/models
 ```
- Example response
+
 #### Response
 ```json
 {
  "object": "list",
@ -310,13 +298,18 @@ curl {JAN_URL}/v1/models
  "object": "list"
 }
 ```
 ### Delete Model
 > OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/delete
-`- Example request
+
 #### Request
 ```shell
 curl -X DELETE {JAN_URL}/v1/models/{model_id}
 ```
- Example response
+
 #### Response
 ```json
 {
  "id": "model-zephyr-7B",
@ -325,14 +318,19 @@ curl -X DELETE {JAN_URL}/v1/models/{model_id}
  "state": "to_download"
 }
 ```
 ### Start Model
 > Jan-only endpoint
 The request to start `model` by changing model state from `ready` to `running`
- Example request
+
 #### Request
 ```shell
 curl -X PUT {JAN_URL}/v1/models{model_id}/start
 ```
- Example response
+
 #### Response
 ```json
 {
  "id": "model-zephyr-7B",
@ -340,14 +338,19 @@ curl -X PUT {JAN_URL}/v1/models{model_id}/start
  "state": "running"
 }
 ```
 ### Stop Model
 > Jan-only endpoint
 The request to start `model` by changing model state from `running` to `ready`
- Example request
+
 #### Request
 ```shell
 curl -X PUT {JAN_URL}/v1/models/{model_id}/stop
 ```
- Example response
+
 #### Response
 ```json
 {
  "id": "model-zephyr-7B",
@ -355,14 +358,17 @@ curl -X PUT {JAN_URL}/v1/models/{model_id}/stop
  "state": "ready"
 }
 ```
 ### Download Model
 > Jan-only endpoint
 The request to download `model` by changing model state from `to_download` to `downloading` then `ready`once it's done.
- Example request
+
 #### Request
 ```shell
 curl -X POST {JAN_URL}/v1/models/
 ```
- Example response
+
 #### Response
 ```json
 {
  "id": "model-zephyr-7B",
@ -370,3 +376,77 @@ curl -X POST {JAN_URL}/v1/models/
  "state": "downloading"
 }
 ```
 ## Examples
 ### Pre-loaded Models
 - Jan ships with a default model folders containing recommended models
 - Only the Model Object `json` files are included
 - Users must later explicitly download the model binaries
 - 
 ```sh
 models/
    mistral-7b/
        mistral-7b.json
    hermes-7b/
        hermes-7b.json
 ```
 ### Azure OpenAI
 - Using a remote API to access model `model-azure-openai-gpt4-turbo.json`
 - See [source](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython&pivots=rest-api)
 ```json
 "source_url": "https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo",
 "parameters": {
  "init" {
    "API-KEY": "",
    "DEPLOYMENT-NAME": "",
    "api-version": "2023-05-15"
  },
  "runtime": {
    "temperature": "0.7",
    "max_tokens": "2048",
    "presence_penalty": "0",
    "top_p": "1",
    "stream": "true"
  }
 }
 "metadata": {
    "engine": "api",
 }
 ```
 ### Multiple quantizations
 - Each quantization has its own `Jan Model Object` file
 ```sh
 llama2-7b-gguf/
    llama2-7b-gguf-Q2.json
    llama2-7b-gguf-Q3_K_L.json
    .bin
 ```
 ### Multiple model partitions
 - A Model that is partitioned into several binaries use just 1 file
 ```sh
 llava-ggml/
    llava-ggml-Q5.json
    .proj
    ggml
 ```
 ### Your locally fine-tuned model
 - ??
 ```sh
 llama-70b-finetune/
    llama-70b-finetune-q5.json
    .bin
 ```
--- a/docs/docs/specs/prompts.md
+++ b/docs/docs/specs/prompts.md
@ -0,0 +1,7 @@
 ---
 title: Prompts
 ---
 - [ ] /prompts folder
 - [ ] How to add to prompts
 - [ ] Assistants can have suggested Prompts
--- a/docs/docs/specs/settings.md
+++ b/docs/docs/specs/settings.md
@ -1,3 +1,5 @@
 ---
 title: Settings
 ---
 - [ ] .jan folder in jan root
--- a/docs/sidebars.js
+++ b/docs/sidebars.js
@ -68,6 +68,7 @@ const sidebars = {
            "specs/jan",
            "specs/fine-tuning",
            "specs/settings",
            "specs/prompts",
          ],
        },
      ],