Partial restructure of Models Spec

This commit is contained in:
Daniel 2023-11-19 04:13:12 +08:00
parent 17258192e8
commit b321427b34
6 changed files with 237 additions and 139 deletions

View File

@ -1,3 +1,11 @@
--- ---
title: How Jan Works title: How Jan Works
--- ---
- Local Filesystem
Follow-on from Quickstart to show how things actually worked
Write in a conversational style, show how things work under the hood
Check how filesystem changed after each request
- Model loading into RAM/VRAM
Explain how the .bin file is loaded via Llama.cpp
Explain how it consumes RAM and VRAM, and refer to system monitor

View File

@ -1,4 +1,4 @@
--- ---
title: "Fine tuning" title: "Fine-tuning"
--- ---
Todo: @hiro Todo: @hiro

View File

@ -10,56 +10,103 @@ Feedback: [HackMD: Models Spec](https://hackmd.io/ulO3uB1AQCqLa5SAAMFOQw)
::: :::
Models are AI models like Llama and Mistral ## Overview
> OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models Jan's Model API aims to be as similar as possible to [OpenAI's Models API](https://platform.openai.com/docs/api-reference/models), with additional methods for managing and running models locally.
## User Stories ### User Objectives
_Users can download a model via a web URL_ - Users can start/stop models and use them in a thread (or via Chat Completions API)
- Users can download, import and delete models
- User can configure model settings at the model level or override it at thread-level
- Users can use remote models (e.g. OpenAI, OpenRouter)
- Wireframes here ## Models Folder
_Users can import a model from local directory_ Models in Jan are stored in the `/models` folder.
- Wireframes here `<model-name>.json` files.
_Users can configure model settings, like run parameters_ - Everything needed to represent a `model` is packaged into an `Model folder`.
- The `folder` is standalone and can be easily zipped, imported, and exported, e.g. to Github.
- The `folder` always contains at least one `Model Object`, declared in a `json` format.
- The `folder` and `file` do not have to share the same name
- The model `id` is made up of `folder_name/filename` and is thus always unique.
- Wireframes here ```sh
/janroot
/models
azure-openai/ # Folder name
azure-openai-gpt3-5.json # File name
_Users can override run settings at runtime_ llama2-70b/
model.json
.gguf
```
- See Assistant Spec and Thread ## Model Object
## Jan Model Object Models in Jan are represented as `json` objects, and are colloquially known as `model.jsons`.
- A `Jan Model Object` is a “representation" of a model Jan's models follow a `<model_name>.json` naming convention.
- Objects are defined by `model-name.json` files in `json` format
- Objects are identified by `folder-name/model-name`, where its `id` is indicative of its file location. Jan's `model.json` aims for rough equivalence with [OpenAI's Model Object](https://platform.openai.com/docs/api-reference/models/object), and add additional properties to support local models.
- Objects are designed to be compatible with `OpenAI Model Objects`, with additional properties needed to run on our infrastructure.
- ALL object properties are optional, i.e. users should be able to run a model declared by an empty `json` file. Jan's `model.json` object properties are optional, i.e. users should be able to run a model declared by an empty `json` file.
```json
// ./models/zephr/zephyr-7b-beta-Q4_K_M.json
{
"source_url": "https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/blob/main/zephyr-7b-beta.Q4_K_M.gguf",
"parameters": {
"init": {
"ctx_len": "2048",
"ngl": "100",
"embedding": "true",
"n_parallel": "4",
"pre_prompt": "A chat between a curious user and an artificial intelligence",
"user_prompt": "USER: ",
"ai_prompt": "ASSISTANT: "
},
"runtime": {
"temperature": "0.7",
"token_limit": "2048",
"top_k": "0",
"top_p": "1",
"stream": "true"
}
},
"metadata": {
"engine": "llamacpp",
"quantization": "Q4_K_M",
"size": "7B",
}
}
```
| Property | Type | Description | Validation | | Property | Type | Description | Validation |
| ----------------------- | ------------------------------------------------------------- | ------------------------------------------------------------------------- | ------------------------------------------------ | | ----------------------- | ------------------------------------------------------------- | ------------------------------------------------------------------------- | ------------------------------------------------ |
| `source_url` | string | The model download source. It can be an external url or a local filepath. | Defaults to `pwd`. See [Source_url](#Source_url) |
| `object` | enum: `model`, `assistant`, `thread`, `message` | Type of the Jan Object. Always `model` | Defaults to "model" | | `object` | enum: `model`, `assistant`, `thread`, `message` | Type of the Jan Object. Always `model` | Defaults to "model" |
| `name` | string | A vanity name | Defaults to filename | | `source_url` | string | The model download source. It can be an external url or a local filepath. | Defaults to `pwd`. See [Source_url](#Source_url) |
| `description` | string | A vanity description of the model | Defaults to "" |
| `state` | enum[`to_download` , `downloading`, `ready` , `running`] | Needs more thought | Defaults to `to_download` |
| `parameters` | map | Defines default model run parameters used by any assistant. | Defaults to `{}` | | `parameters` | map | Defines default model run parameters used by any assistant. | Defaults to `{}` |
| `description` | string | A vanity description of the model | Defaults to "" |
| `metadata` | map | Stores additional structured information about the model. | Defaults to `{}` | | `metadata` | map | Stores additional structured information about the model. | Defaults to `{}` |
| `metadata.engine` | enum: `llamacpp`, `api`, `tensorrt` | The model backend used to run model. | Defaults to "llamacpp" | | `metadata.engine` | enum: `llamacpp`, `api`, `tensorrt` | The model backend used to run model. | Defaults to "llamacpp" |
| `metadata.quantization` | string | Supported formats only | See [Custom importers](#Custom-importers) | | `metadata.quantization` | string | Supported formats only | See [Custom importers](#Custom-importers) |
| `metadata.binaries` | array | Supported formats only. | See [Custom importers](#Custom-importers) | | `metadata.binaries` | array | Supported formats only. | See [Custom importers](#Custom-importers) |
| `state` | enum[`to_download` , `downloading`, `ready` , `running`] | Needs more thought | Defaults to `to_download` |
| `name` | string | A vanity name | Defaults to filename |
### Source_url ### Model Source
There are 3 types of model sources
- Local model
- Remote source
- Cloud API
- Users can download models from a `remote` source or reference an existing `local` model. - Users can download models from a `remote` source or reference an existing `local` model.
- If this property is not specified in the Model Object file, then the default behavior is to look in the current directory. - If this property is not specified in the Model Object file, then the default behavior is to look in the current directory.
#### Local source_url
- Users can import a local model by providing the filepath to the model - Users can import a local model by providing the filepath to the model
```json ```json
@ -70,14 +117,36 @@ _Users can override run settings at runtime_
"source_url": "./", "source_url": "./",
``` ```
#### Remote source_url
- Users can download a model by remote URL. - Users can download a model by remote URL.
- Supported url formats: - Supported url formats:
- `https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/blob/main/llama-2-7b-chat.Q3_K_L.gguf` - `https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/blob/main/llama-2-7b-chat.Q3_K_L.gguf`
- `https://any-source.com/.../model-binary.bin` - `https://any-source.com/.../model-binary.bin`
#### Custom importers - Using a remote API to access model `model-azure-openai-gpt4-turbo.json`
- See [source](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython&pivots=rest-api)
```json
"source_url": "https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo",
"parameters": {
"init" {
"API-KEY": "",
"DEPLOYMENT-NAME": "",
"api-version": "2023-05-15"
},
"runtime": {
"temperature": "0.7",
"max_tokens": "2048",
"presence_penalty": "0",
"top_p": "1",
"stream": "true"
}
}
"metadata": {
"engine": "api",
}
```
### Model Formats
Additionally, Jan supports importing popular formats. For example, if you provide a HuggingFace URL for a `TheBloke` model, Jan automatically downloads and catalogs all quantizations. Custom importers autofills properties like `metadata.quantization` and `metadata.size`. Additionally, Jan supports importing popular formats. For example, if you provide a HuggingFace URL for a `TheBloke` model, Jan automatically downloads and catalogs all quantizations. Custom importers autofills properties like `metadata.quantization` and `metadata.size`.
@ -89,7 +158,8 @@ Supported URL formats with custom importers:
- `azure_openai`: `https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo` - `azure_openai`: `https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo`
- `openai`: `api.openai.com` - `openai`: `api.openai.com`
### Generic Example <details>
<summary>Example: Zephyr 7B</summary>
- Model has 1 binary `model-zephyr-7B.json` - Model has 1 binary `model-zephyr-7B.json`
- See [source](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/) - See [source](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/)
@ -122,8 +192,9 @@ Supported URL formats with custom importers:
"size": "7B", "size": "7B",
} }
``` ```
</details>
### Example: multiple binaries ### Multiple binaries
- Model has multiple binaries `model-llava-1.5-ggml.json` - Model has multiple binaries `model-llava-1.5-ggml.json`
- See [source](https://huggingface.co/mys/ggml_llava-v1.5-13b) - See [source](https://huggingface.co/mys/ggml_llava-v1.5-13b)
@ -139,112 +210,24 @@ Supported URL formats with custom importers:
} }
``` ```
### Example: Azure API ## Models API
- Using a remote API to access model `model-azure-openai-gpt4-turbo.json` ### Get Model
- See [source](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython&pivots=rest-api)
```json - OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/retrieve
"source_url": "https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo", - OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/object
"parameters": {
"init" {
"API-KEY": "",
"DEPLOYMENT-NAME": "",
"api-version": "2023-05-15"
},
"runtime": {
"temperature": "0.7",
"max_tokens": "2048",
"presence_penalty": "0",
"top_p": "1",
"stream": "true"
}
}
"metadata": {
"engine": "api",
}
```
## Filesystem
- Everything needed to represent a `model` is packaged into an `Model folder`.
- The `folder` is standalone and can be easily zipped, imported, and exported, e.g. to Github.
- The `folder` always contains at least one `Model Object`, declared in a `json` format.
- The `folder` and `file` do not have to share the same name
- The model `id` is made up of `folder_name/filename` and is thus always unique.
```sh
/janroot
/models
azure-openai/ # Folder name
azure-openai-gpt3-5.json # File name
llama2-70b/
model.json
.gguf
```
### Default ./model folder
- Jan ships with a default model folders containing recommended models
- Only the Model Object `json` files are included
- Users must later explicitly download the model binaries
```sh
models/
mistral-7b/
mistral-7b.json
hermes-7b/
hermes-7b.json
```
### Multiple quantizations
- Each quantization has its own `Jan Model Object` file
```sh
llama2-7b-gguf/
llama2-7b-gguf-Q2.json
llama2-7b-gguf-Q3_K_L.json
.bin
```
### Multiple model partitions
- A Model that is partitioned into several binaries use just 1 file
```sh
llava-ggml/
llava-ggml-Q5.json
.proj
ggml
```
### Your locally fine-tuned model
- ??
```sh
llama-70b-finetune/
llama-70b-finetune-q5.json
.bin
```
## Jan API
### Model API Object
- The `Jan Model Object` maps into the `OpenAI Model Object`. - The `Jan Model Object` maps into the `OpenAI Model Object`.
- Properties marked with `*` are compatible with the [OpenAI `model` object](https://platform.openai.com/docs/api-reference/models) - Properties marked with `*` are compatible with the [OpenAI `model` object](https://platform.openai.com/docs/api-reference/models)
- Note: The `Jan Model Object` has additional properties when retrieved via its API endpoint. - Note: The `Jan Model Object` has additional properties when retrieved via its API endpoint.
> OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/object
### Model lifecycle #### Request
Model has 4 states (enum)
- `to_download`
- `downloading`
- `ready`
- `running`
### Get Model
> OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/retrieve
- Example request
```shell ```shell
curl {JAN_URL}/v1/models/{model_id} curl {JAN_URL}/v1/models/{model_id}
``` ```
- Example response
#### Response
```json ```json
{ {
"id": "model-zephyr-7B", "id": "model-zephyr-7B",
@ -273,14 +256,19 @@ curl {JAN_URL}/v1/models/{model_id}
} }
} }
``` ```
### List models ### List models
Lists the currently available models, and provides basic information about each one such as the owner and availability. Lists the currently available models, and provides basic information about each one such as the owner and availability.
> OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/list > OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/list
- Example request
#### Request
```shell= ```shell=
curl {JAN_URL}/v1/models curl {JAN_URL}/v1/models
``` ```
- Example response
#### Response
```json ```json
{ {
"object": "list", "object": "list",
@ -310,13 +298,18 @@ curl {JAN_URL}/v1/models
"object": "list" "object": "list"
} }
``` ```
### Delete Model ### Delete Model
> OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/delete > OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/delete
`- Example request
#### Request
```shell ```shell
curl -X DELETE {JAN_URL}/v1/models/{model_id} curl -X DELETE {JAN_URL}/v1/models/{model_id}
``` ```
- Example response
#### Response
```json ```json
{ {
"id": "model-zephyr-7B", "id": "model-zephyr-7B",
@ -325,14 +318,19 @@ curl -X DELETE {JAN_URL}/v1/models/{model_id}
"state": "to_download" "state": "to_download"
} }
``` ```
### Start Model ### Start Model
> Jan-only endpoint > Jan-only endpoint
The request to start `model` by changing model state from `ready` to `running` The request to start `model` by changing model state from `ready` to `running`
- Example request
#### Request
```shell ```shell
curl -X PUT {JAN_URL}/v1/models{model_id}/start curl -X PUT {JAN_URL}/v1/models{model_id}/start
``` ```
- Example response
#### Response
```json ```json
{ {
"id": "model-zephyr-7B", "id": "model-zephyr-7B",
@ -340,14 +338,19 @@ curl -X PUT {JAN_URL}/v1/models{model_id}/start
"state": "running" "state": "running"
} }
``` ```
### Stop Model ### Stop Model
> Jan-only endpoint > Jan-only endpoint
The request to start `model` by changing model state from `running` to `ready` The request to start `model` by changing model state from `running` to `ready`
- Example request
#### Request
```shell ```shell
curl -X PUT {JAN_URL}/v1/models/{model_id}/stop curl -X PUT {JAN_URL}/v1/models/{model_id}/stop
``` ```
- Example response
#### Response
```json ```json
{ {
"id": "model-zephyr-7B", "id": "model-zephyr-7B",
@ -355,18 +358,95 @@ curl -X PUT {JAN_URL}/v1/models/{model_id}/stop
"state": "ready" "state": "ready"
} }
``` ```
### Download Model ### Download Model
> Jan-only endpoint > Jan-only endpoint
The request to download `model` by changing model state from `to_download` to `downloading` then `ready`once it's done. The request to download `model` by changing model state from `to_download` to `downloading` then `ready`once it's done.
- Example request
#### Request
```shell ```shell
curl -X POST {JAN_URL}/v1/models/ curl -X POST {JAN_URL}/v1/models/
``` ```
- Example response
#### Response
```json ```json
{ {
"id": "model-zephyr-7B", "id": "model-zephyr-7B",
"object": "model", "object": "model",
"state": "downloading" "state": "downloading"
} }
```
## Examples
### Pre-loaded Models
- Jan ships with a default model folders containing recommended models
- Only the Model Object `json` files are included
- Users must later explicitly download the model binaries
-
```sh
models/
mistral-7b/
mistral-7b.json
hermes-7b/
hermes-7b.json
```
### Azure OpenAI
- Using a remote API to access model `model-azure-openai-gpt4-turbo.json`
- See [source](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython&pivots=rest-api)
```json
"source_url": "https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo",
"parameters": {
"init" {
"API-KEY": "",
"DEPLOYMENT-NAME": "",
"api-version": "2023-05-15"
},
"runtime": {
"temperature": "0.7",
"max_tokens": "2048",
"presence_penalty": "0",
"top_p": "1",
"stream": "true"
}
}
"metadata": {
"engine": "api",
}
```
### Multiple quantizations
- Each quantization has its own `Jan Model Object` file
```sh
llama2-7b-gguf/
llama2-7b-gguf-Q2.json
llama2-7b-gguf-Q3_K_L.json
.bin
```
### Multiple model partitions
- A Model that is partitioned into several binaries use just 1 file
```sh
llava-ggml/
llava-ggml-Q5.json
.proj
ggml
```
### Your locally fine-tuned model
- ??
```sh
llama-70b-finetune/
llama-70b-finetune-q5.json
.bin
``` ```

View File

@ -0,0 +1,7 @@
---
title: Prompts
---
- [ ] /prompts folder
- [ ] How to add to prompts
- [ ] Assistants can have suggested Prompts

View File

@ -1,3 +1,5 @@
--- ---
title: Settings title: Settings
--- ---
- [ ] .jan folder in jan root

View File

@ -68,6 +68,7 @@ const sidebars = {
"specs/jan", "specs/jan",
"specs/fine-tuning", "specs/fine-tuning",
"specs/settings", "specs/settings",
"specs/prompts",
], ],
}, },
], ],