feat(spec-model): Update based on team discussion on Nov 20

2023-11-21 12:44:33 +07:00 · 2023-11-21 12:44:33 +07:00 · f03b13b9e8
commit f03b13b9e8
parent 0dd70824e8
1 changed files with 142 additions and 294 deletions
--- a/docs/docs/specs/models.md
+++ b/docs/docs/specs/models.md
@ -1,9 +1,4 @@
---
-title: Models
---
-
-import ApiSchema from '@theme/ApiSchema';
-
+# Models Spec v1
 :::warning

 Draft Specification: functionality has not been implemented yet. 
@ -21,161 +16,151 @@ Jan's Model API aims to be as similar as possible to [OpenAI's Models API](https
 - Users can download, import and delete models  
 - Users can use remote models (e.g. OpenAI, OpenRouter)
 - Users can start/stop models and use them in a thread (or via Chat Completions API)
- User can configure default model parameters at the model level (to be overridden later at message or thread level)
+- User can configure default model parameters at the model level (to be overridden later at `chat/completions` or `assistant`/`thread` level)

-## Models Folder
+## Design Principle
+- Don't go for simplicity yet
+- Underlying abstractions are changing very frequently (e.g. ggufv3)
+- Provide a minimalist framework over the abstractions that takes care of coordination between tools
+- Show direct system state for now

-Models in Jan are stored in the `/models` folder.
+## KIVs to Model Spec v2
+- OpenAI and Azure OpenAI
+- Importing via URL
+- Multiple Partitions

-Models are stored and organized by folders, which are atomic representations of a model for easy packaging and version control. 
-
-A model's folder name is its `model.id` and contains:
-
- `<model-id>.json`, i.e. the [Model Object](#model-object)
- Binaries (may be downloaded later)
-
-```shell
-/jan    # Jan root folder
-    /models
-        # GGUF model
-        /llama2-70b
-            llama2-70b.json
-            llama2-70b-q4_k_m.gguf
-
-        # Recommended Model (yet to be downloaded)
-        /mistral-7b
-            mistral-7b.json     # Contains download instructions
-            # Note: mistral-7b-*.gguf binaries not downloaded yet
-
-        # Remote model
-        /azure-openai-gpt3-5
-            azure-openai-gpt3-5.json     
-            # Note: No binaries
-
-        # Multiple Binaries
-        # COMING SOON
-
-        # Multiple Quantizations
-        # COMING SOON
-
-        # Imported model (autogenerated .json)
+## Models folder structure
+- Models in Jan are stored in the `/models` folder.
+- Models are stored and organized by folders, which are atomic representations of a model for easy packaging and version control. 
+```sh
+/jan/    # Jan root folder
+  /models/
+    llama2-70b-q4_k_m/
+        model-binary-1.gguf
+        model.json
+    mistral-7b-gguf-q3_k_l/
+        model.json
+        mistral-7b-q3-K-L.gguf
+      mistral-7b-gguf-q8_k_m./
+        model.json
+        mistral-7b-q8_k_k.gguf
+      random-model-q4_k_m/
        random-model-q4_k_m.bin
-        # Note: will be moved into a autogenerated folder 
-        # /random-model-q4_k_m 
-        #   random-model-q4_k_m.bin
-        #   random-model-q4_k_m.json (autogenerated)
+        random-model-q4_k_m.json # (autogenerated)
 ```

-### Importing Models
+## Model Object
+- Jan represents models as `json`-based Model Object files, known colloquially as `model.json`.
+-Jan aims for rough equivalence with [OpenAI's Model Object](https://platform.openai.com/docs/api-reference/models/object) with additional properties to support local models.  
+- Jan's models follow a `model.json` naming convention, and are built to be extremely lightweight, with the only mandatory field being a `source_url` to download the model binaries. 
+
+### Types of Models
+
+There are 3 types of models. 
+
+- [x] Local model, yet-to-be downloaded (we have the URL)
+- [x] Local model (downloaded)
+
+## Examples
+### Local Model
+
+- Model has 1 binary `model-zephyr-7B.json`
+- See [source](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/)
+
+#### `model.json`
+```json
+"type": "model",
+"version": "1",
+"id": "zephyr-7b" // used in chat-completions model_name, matches folder name
+"name": "Zephyr 7B"
+"owned_by": "" // OpenAI compatibility
+"created": 1231231 // unix timestamp
+"description": "..." 
+"state": enum[null, "downloading", "available"]
+// KIV: remote: // Subsequent
+// KIV: type: "llm" // For future where there are different types
+"format": "ggufv3", // State format, rather than engine
+"source_url": "https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/blob/main/zephyr-7b-beta.Q4_K_M.gguf",
+"settings" {
+  "ctx_len": "2048",
+  "ngl": "100",
+  "embedding": "true",
+  "n_parallel": "4",
+  // KIV: "pre_prompt": "A chat between a curious user and an artificial intelligence",
+  // KIV:"user_prompt": "USER: ",
+  // KIV: "ai_prompt": "ASSISTANT: "
+}
+"parameters": {
+  "temperature": "0.7",
+  "token_limit": "2048",
+  "top_k": "0",
+  "top_p": "1",
+  "stream": "true"
+  },
+  "metadata": {}
+  "assets": [
+      "file://.../zephyr-7b-q4_k_m.bin",
+      "https://huggin"
+  ]
+```
+
+### Deferred Download
+```sh
+models/
+    mistral-7b/
+        model.json
+    hermes-7b/
+        model.json
+```
+- Jan ships with a default model folders containing recommended models
+- Only the Model Object `json` files are included
+- Users must later explicitly download the model binaries
+
+### Multiple model partitions
+
+```sh
+llava-ggml-Q5/
+    model.json
+    mmprj.bin
+    model_q5.ggml
+```
+
+### Locally fine-tuned/ custom imported model
+
+```sh
+llama-70b-finetune/
+    llama-70b-finetune-q5.json
+    .bin
+```
+
+## Models API
+
+| Method         | API Call                        | OpenAI-equivalent |
+| -------------- | ------------------------------- | ----------------- |
+| List Models    | GET /v1/models                  | true              |
+| Get Model      | GET /v1/models/{model_id}       | true              |
+| Delete Model   | DELETE /v1/models/{model_id}    | true              |
+| Start Model    | PUT /v1/models/{model_id}/start | no                |
+| Stop Model     | PUT /v1/models/{model_id}/start | no                |
+| Download Model | POST /v1/models/                | no                |
+
+## Importing Models

 :::warning

 - This has not been confirmed
- Dan's view: Jan should auto-detect and create folders automatically
+- Jan should auto-detect and create folders automatically
 - Jan's UI will allow users to rename folders and add metadata

 :::

 You can import a model by just dragging it into the `/models` folder, similar to Oobabooga. 

- Jan will detect and generate a corresponding `model-filename.json` file based on filename
+- Jan will detect and generate a corresponding `model.json` file based on model asset filename
 - Jan will move it into its own `/model-id` folder once you define a `model-id` via the UI
- Jan will populate the model's `model-id.json` as you add metadata through the UI
+- Jan will populate the model's `/model-id/model.json` as you add metadata through the UI

-## Model Object
-
-:::warning
-
- This is currently not finalized
- Dan's view: I think the current JSON is extremely clunky
-  - We should move `init` to top-level (e.g. "settings"?)
-  - We should move `runtime` to top-level (e.g. "parameters"?)
-  - `metadata` is extremely overloaded and should be refactored
- Dan's view: we should make a model object very extensible
-  - A `GGUF` model would "extend" a common model object with extra fields (at top level)
- Dan's view: State is extremely badly named
-  - Recommended: `downloaded`, `started`, `stopped`, null (for yet-to-download)
-  - We should also note that this is only for local models (not remote)
-
-:::
-
-Jan represents models as `json`-based Model Object files, known colloquially as `model.jsons`. Jan aims for rough equivalence with [OpenAI's Model Object](https://platform.openai.com/docs/api-reference/models/object) with additional properties to support local models.  
-
-Jan's models follow a `model_id.json` naming convention, and are built to be extremely lightweight, with the only mandatory field being a `source_url` to download the model binaries. 
-
-<ApiSchema example pointer="#/components/schemas/Model" />
-
-### Types of Models
-
-:::warning
-
- This is currently not in the Model Object, and requires further discussion.
- Dan's view: we should have a field to differentiate between `local` and `remote` models
-
-:::
-
-There are 3 types of models. 
-
- Local model
- Local model, yet-to-be downloaded (we have the URL)
- Remote model (i.e. OpenAI API)
-
-#### Local Models
-
-:::warning
-
- This is currently not finalized
- Dan's view: we should have `download_url` and `local_url` for local models (and possibly more)
-
-:::
-
-A `model.json` for a local model should always reference the following fields: 
-
- `download_url`: the original download source of the model
- `local_url`: the current location of the model binaries (may be array of multiple binaries)
-
-```json
-// ./models/llama2/llama2-7bn-gguf.json
-"local_url": "~/Downloads/llama-2-7bn-q5-k-l.gguf",
-```
-
-#### Remote Models
-
-:::warning
-
- This is currently not finalized 
- Dan's view: each cloud model should be provided via a syste module, or define its own params field on the `model` or `model.init` object
-
-:::
-
-A `model.json` for a remote model should always reference the following fields:
-
- `api_url`: the API endpoint of the model
- Any authentication parameters
-
-```json
-// Dan's view: This needs to be refactored pretty significantly
-"source_url": "https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo",
-"parameters": {
-  "init" {
-    "API-KEY": "",
-    "DEPLOYMENT-NAME": "",
-    "api-version": "2023-05-15"
-  },
-  "runtime": {
-    "temperature": "0.7",
-    "max_tokens": "2048",
-    "presence_penalty": "0",
-    "top_p": "1",
-    "stream": "true"
-  }
-}
-"metadata": {
-    "engine": "api",        // Dan's view: this should be a `type` field
-}
-```
-
-### Importers
+### Jan Model Importers extension

 :::caution

@ -192,154 +177,17 @@ Currently, pasting a TheBloke Huggingface link in the Explore Models page will f
 - Nicely-formatted model card
 - Fully-annotated `model.json` file

-### Multiple Binaries
-
-:::warning
-
- This is currently not finalized
- Dan's view: having these fields under `model.metadata` is not maintainable
- We should explore some sort of `local_url` structure
-
-:::
-
- Model has multiple binaries `model-llava-1.5-ggml.json`
- See [source](https://huggingface.co/mys/ggml_llava-v1.5-13b)
-
-```json
-"source_url": "https://huggingface.co/mys/ggml_llava-v1.5-13b",
-"parameters": {"init": {}, "runtime": {}}
-"metadata": {
-    "mmproj_binary": "https://huggingface.co/mys/ggml_llava-v1.5-13b/blob/main/mmproj-model-f16.gguf",
-    "ggml_binary": "https://huggingface.co/mys/ggml_llava-v1.5-13b/blob/main/ggml-model-q5_k.gguf",
-    "engine": "llamacpp",
-    "quantization": "Q5_K"
-}
-```
-
-## Models API
-
-:::warning
-
- We should use the OpenAPI spec to discuss APIs
- Dan's view: This needs @louis and App Pod to review as they are more familiar with this
- Dan's view: Start/Stop model should have some UI indicator (show state, block input) 
-
-::: 
-
-See http://localhost:3001/api-reference#tag/Models. 
-
-| Method         | API Call                        | OpenAI-equivalent |
-| -------------- | ------------------------------- | ----------------- |
-| List Models    | GET /v1/models                  | true              |
-| Get Model      | GET /v1/models/{model_id}       | true              |
-| Delete Model   | DELETE /v1/models/{model_id}    | true              |
-| Start Model    | PUT /v1/models/{model_id}/start |                   |
-| Stop Model     | PUT /v1/models/{model_id}/start |                   |
-| Download Model | POST /v1/models/                |                   |
-
-## Examples
-
-### Local Model
-
- Model has 1 binary `model-zephyr-7B.json`
- See [source](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/)
-
-```json
-// ./models/zephr/zephyr-7b-beta-Q4_K_M.json
-// Note: Default fields omitted for brevity
-"source_url": "https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/blob/main/zephyr-7b-beta.Q4_K_M.gguf",
-"parameters": {
-  "init": {
-    "ctx_len": "2048",
-    "ngl": "100",
-    "embedding": "true",
-    "n_parallel": "4",
-    "pre_prompt": "A chat between a curious user and an artificial intelligence",
-    "user_prompt": "USER: ",
-    "ai_prompt": "ASSISTANT: "
-  },
-  "runtime": {
-    "temperature": "0.7",
-    "token_limit": "2048",
-    "top_k": "0",
-    "top_p": "1",
-    "stream": "true"
-  }
-},
-"metadata": {
-    "engine": "llamacpp",
-    "quantization": "Q3_K_L",
-    "size": "7B",
-}
-```
-
-### Remote Model
-
- Using a remote API to access model `model-azure-openai-gpt4-turbo.json`
- See [source](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython&pivots=rest-api)
-
-```json
-"source_url": "https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo",
-"parameters": {
-  "init" {
-    "API-KEY": "",
-    "DEPLOYMENT-NAME": "",
-    "api-version": "2023-05-15"
-  },
-  "runtime": {
-    "temperature": "0.7",
-    "max_tokens": "2048",
-    "presence_penalty": "0",
-    "top_p": "1",
-    "stream": "true"
-  }
-}
-"metadata": {
-    "engine": "api",
-}
-```
-
-### Deferred Download
-
- Jan ships with a default model folders containing recommended models
- Only the Model Object `json` files are included
- Users must later explicitly download the model binaries
- 
-```sh
-models/
-    mistral-7b/
-        mistral-7b.json
-    hermes-7b/
-        hermes-7b.json
-```
-
-### Multiple quantizations
-
- Each quantization has its own `Jan Model Object` file
- TODO: `model.json`?
-
-```sh
-llama2-7b-gguf/
-    llama2-7b-gguf-Q2.json
-    llama2-7b-gguf-Q3_K_L.json
-    .bin
-```
-
-### Multiple model partitions
-
- A Model that is partitioned into several binaries use just 1 file
-
-```sh
-llava-ggml/
-    llava-ggml-Q5.json
-    .proj
-    ggml
-```
-
-### Locally fine-tuned model
-
-```sh
-llama-70b-finetune/
-    llama-70b-finetune-q5.json
-    .bin
-```
+### ADR
+- `<model-id>.json`, i.e. the [Model Object](#model-object)
+- Why multiple folders?
+    - Model Partitions (e.g. Llava in the future)
+- Why a folder and config file for each quantization?
+    - Differently quantized models are completely different models
+- Milestone -1st December: 
+    - Catalogue of recommended models, anything else = mutate the filesystem 
+- [@linh] Should we have an API to help quantize models? 
+    - Could be a really cool feature to have (i.e. import from HF, quantize model, run on CPU)
+- We should have a helper function to handle hardware compatibility
+    - POST model/{model-id}/compatibility
+- [louis] We are combining states & manifest
+    - Need to think through