diff --git a/docs/docs/intro/how-jan-works.md b/docs/docs/intro/how-jan-works.md index 86fe39b32..54122356d 100644 --- a/docs/docs/intro/how-jan-works.md +++ b/docs/docs/intro/how-jan-works.md @@ -1,3 +1,11 @@ --- title: How Jan Works ---- \ No newline at end of file +--- + +- Local Filesystem +Follow-on from Quickstart to show how things actually worked +Write in a conversational style, show how things work under the hood +Check how filesystem changed after each request +- Model loading into RAM/VRAM +Explain how the .bin file is loaded via Llama.cpp +Explain how it consumes RAM and VRAM, and refer to system monitor diff --git a/docs/docs/specs/fine-tuning.md b/docs/docs/specs/fine-tuning.md index df6723ff1..281a065b2 100644 --- a/docs/docs/specs/fine-tuning.md +++ b/docs/docs/specs/fine-tuning.md @@ -1,4 +1,4 @@ --- -title: "Fine tuning" +title: "Fine-tuning" --- Todo: @hiro \ No newline at end of file diff --git a/docs/docs/specs/models.md b/docs/docs/specs/models.md index d7e19be0a..f5fd26e3f 100644 --- a/docs/docs/specs/models.md +++ b/docs/docs/specs/models.md @@ -10,56 +10,103 @@ Feedback: [HackMD: Models Spec](https://hackmd.io/ulO3uB1AQCqLa5SAAMFOQw) ::: -Models are AI models like Llama and Mistral +## Overview -> OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models +Jan's Model API aims to be as similar as possible to [OpenAI's Models API](https://platform.openai.com/docs/api-reference/models), with additional methods for managing and running models locally. -## User Stories +### User Objectives -_Users can download a model via a web URL_ +- Users can start/stop models and use them in a thread (or via Chat Completions API) +- Users can download, import and delete models +- User can configure model settings at the model level or override it at thread-level +- Users can use remote models (e.g. OpenAI, OpenRouter) -- Wireframes here +## Models Folder -_Users can import a model from local directory_ +Models in Jan are stored in the `/models` folder. -- Wireframes here + `.json` files. -_Users can configure model settings, like run parameters_ +- Everything needed to represent a `model` is packaged into an `Model folder`. +- The `folder` is standalone and can be easily zipped, imported, and exported, e.g. to Github. +- The `folder` always contains at least one `Model Object`, declared in a `json` format. +- The `folder` and `file` do not have to share the same name +- The model `id` is made up of `folder_name/filename` and is thus always unique. -- Wireframes here +```sh +/janroot + /models + azure-openai/ # Folder name + azure-openai-gpt3-5.json # File name -_Users can override run settings at runtime_ + llama2-70b/ + model.json + .gguf +``` -- See Assistant Spec and Thread +## Model Object -## Jan Model Object +Models in Jan are represented as `json` objects, and are colloquially known as `model.jsons`. -- A `Jan Model Object` is a “representation" of a model -- Objects are defined by `model-name.json` files in `json` format -- Objects are identified by `folder-name/model-name`, where its `id` is indicative of its file location. -- Objects are designed to be compatible with `OpenAI Model Objects`, with additional properties needed to run on our infrastructure. -- ALL object properties are optional, i.e. users should be able to run a model declared by an empty `json` file. +Jan's models follow a `.json` naming convention. + +Jan's `model.json` aims for rough equivalence with [OpenAI's Model Object](https://platform.openai.com/docs/api-reference/models/object), and add additional properties to support local models. + +Jan's `model.json` object properties are optional, i.e. users should be able to run a model declared by an empty `json` file. + +```json +// ./models/zephr/zephyr-7b-beta-Q4_K_M.json +{ + "source_url": "https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/blob/main/zephyr-7b-beta.Q4_K_M.gguf", + "parameters": { + "init": { + "ctx_len": "2048", + "ngl": "100", + "embedding": "true", + "n_parallel": "4", + "pre_prompt": "A chat between a curious user and an artificial intelligence", + "user_prompt": "USER: ", + "ai_prompt": "ASSISTANT: " + }, + "runtime": { + "temperature": "0.7", + "token_limit": "2048", + "top_k": "0", + "top_p": "1", + "stream": "true" + } + }, + "metadata": { + "engine": "llamacpp", + "quantization": "Q4_K_M", + "size": "7B", + } +} +``` | Property | Type | Description | Validation | | ----------------------- | ------------------------------------------------------------- | ------------------------------------------------------------------------- | ------------------------------------------------ | -| `source_url` | string | The model download source. It can be an external url or a local filepath. | Defaults to `pwd`. See [Source_url](#Source_url) | | `object` | enum: `model`, `assistant`, `thread`, `message` | Type of the Jan Object. Always `model` | Defaults to "model" | -| `name` | string | A vanity name | Defaults to filename | -| `description` | string | A vanity description of the model | Defaults to "" | -| `state` | enum[`to_download` , `downloading`, `ready` , `running`] | Needs more thought | Defaults to `to_download` | +| `source_url` | string | The model download source. It can be an external url or a local filepath. | Defaults to `pwd`. See [Source_url](#Source_url) | | `parameters` | map | Defines default model run parameters used by any assistant. | Defaults to `{}` | +| `description` | string | A vanity description of the model | Defaults to "" | | `metadata` | map | Stores additional structured information about the model. | Defaults to `{}` | | `metadata.engine` | enum: `llamacpp`, `api`, `tensorrt` | The model backend used to run model. | Defaults to "llamacpp" | | `metadata.quantization` | string | Supported formats only | See [Custom importers](#Custom-importers) | | `metadata.binaries` | array | Supported formats only. | See [Custom importers](#Custom-importers) | +| `state` | enum[`to_download` , `downloading`, `ready` , `running`] | Needs more thought | Defaults to `to_download` | +| `name` | string | A vanity name | Defaults to filename | -### Source_url +### Model Source + +There are 3 types of model sources + +- Local model +- Remote source +- Cloud API - Users can download models from a `remote` source or reference an existing `local` model. - If this property is not specified in the Model Object file, then the default behavior is to look in the current directory. - -#### Local source_url - - Users can import a local model by providing the filepath to the model ```json @@ -70,14 +117,36 @@ _Users can override run settings at runtime_ "source_url": "./", ``` -#### Remote source_url - - Users can download a model by remote URL. - Supported url formats: - `https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/blob/main/llama-2-7b-chat.Q3_K_L.gguf` - `https://any-source.com/.../model-binary.bin` -#### Custom importers +- Using a remote API to access model `model-azure-openai-gpt4-turbo.json` +- See [source](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython&pivots=rest-api) + +```json +"source_url": "https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo", +"parameters": { + "init" { + "API-KEY": "", + "DEPLOYMENT-NAME": "", + "api-version": "2023-05-15" + }, + "runtime": { + "temperature": "0.7", + "max_tokens": "2048", + "presence_penalty": "0", + "top_p": "1", + "stream": "true" + } +} +"metadata": { + "engine": "api", +} +``` + +### Model Formats Additionally, Jan supports importing popular formats. For example, if you provide a HuggingFace URL for a `TheBloke` model, Jan automatically downloads and catalogs all quantizations. Custom importers autofills properties like `metadata.quantization` and `metadata.size`. @@ -89,7 +158,8 @@ Supported URL formats with custom importers: - `azure_openai`: `https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo` - `openai`: `api.openai.com` -### Generic Example +
+ Example: Zephyr 7B - Model has 1 binary `model-zephyr-7B.json` - See [source](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/) @@ -122,8 +192,9 @@ Supported URL formats with custom importers: "size": "7B", } ``` +
-### Example: multiple binaries +### Multiple binaries - Model has multiple binaries `model-llava-1.5-ggml.json` - See [source](https://huggingface.co/mys/ggml_llava-v1.5-13b) @@ -139,112 +210,24 @@ Supported URL formats with custom importers: } ``` -### Example: Azure API +## Models API -- Using a remote API to access model `model-azure-openai-gpt4-turbo.json` -- See [source](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython&pivots=rest-api) +### Get Model -```json -"source_url": "https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo", -"parameters": { - "init" { - "API-KEY": "", - "DEPLOYMENT-NAME": "", - "api-version": "2023-05-15" - }, - "runtime": { - "temperature": "0.7", - "max_tokens": "2048", - "presence_penalty": "0", - "top_p": "1", - "stream": "true" - } -} -"metadata": { - "engine": "api", -} -``` - -## Filesystem - -- Everything needed to represent a `model` is packaged into an `Model folder`. -- The `folder` is standalone and can be easily zipped, imported, and exported, e.g. to Github. -- The `folder` always contains at least one `Model Object`, declared in a `json` format. -- The `folder` and `file` do not have to share the same name -- The model `id` is made up of `folder_name/filename` and is thus always unique. - -```sh -/janroot - /models - azure-openai/ # Folder name - azure-openai-gpt3-5.json # File name - - llama2-70b/ - model.json - .gguf -``` - -### Default ./model folder -- Jan ships with a default model folders containing recommended models -- Only the Model Object `json` files are included -- Users must later explicitly download the model binaries -```sh -models/ - mistral-7b/ - mistral-7b.json - hermes-7b/ - hermes-7b.json -``` -### Multiple quantizations - -- Each quantization has its own `Jan Model Object` file - -```sh -llama2-7b-gguf/ - llama2-7b-gguf-Q2.json - llama2-7b-gguf-Q3_K_L.json - .bin -``` -### Multiple model partitions - -- A Model that is partitioned into several binaries use just 1 file - -```sh -llava-ggml/ - llava-ggml-Q5.json - .proj - ggml -``` -### Your locally fine-tuned model - -- ?? - -```sh -llama-70b-finetune/ - llama-70b-finetune-q5.json - .bin -``` -## Jan API -### Model API Object +- OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/retrieve +- OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/object - The `Jan Model Object` maps into the `OpenAI Model Object`. - Properties marked with `*` are compatible with the [OpenAI `model` object](https://platform.openai.com/docs/api-reference/models) - Note: The `Jan Model Object` has additional properties when retrieved via its API endpoint. -> OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/object -### Model lifecycle -Model has 4 states (enum) -- `to_download` -- `downloading` -- `ready` -- `running` +#### Request -### Get Model -> OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/retrieve -- Example request ```shell curl {JAN_URL}/v1/models/{model_id} ``` -- Example response + +#### Response + ```json { "id": "model-zephyr-7B", @@ -273,14 +256,19 @@ curl {JAN_URL}/v1/models/{model_id} } } ``` + ### List models Lists the currently available models, and provides basic information about each one such as the owner and availability. > OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/list -- Example request + +#### Request + ```shell= curl {JAN_URL}/v1/models ``` -- Example response + +#### Response + ```json { "object": "list", @@ -310,13 +298,18 @@ curl {JAN_URL}/v1/models "object": "list" } ``` + ### Delete Model > OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/delete -`- Example request + +#### Request + ```shell curl -X DELETE {JAN_URL}/v1/models/{model_id} ``` -- Example response + +#### Response + ```json { "id": "model-zephyr-7B", @@ -325,14 +318,19 @@ curl -X DELETE {JAN_URL}/v1/models/{model_id} "state": "to_download" } ``` + ### Start Model > Jan-only endpoint The request to start `model` by changing model state from `ready` to `running` -- Example request + +#### Request + ```shell curl -X PUT {JAN_URL}/v1/models{model_id}/start ``` -- Example response + +#### Response + ```json { "id": "model-zephyr-7B", @@ -340,14 +338,19 @@ curl -X PUT {JAN_URL}/v1/models{model_id}/start "state": "running" } ``` + ### Stop Model > Jan-only endpoint The request to start `model` by changing model state from `running` to `ready` -- Example request + +#### Request + ```shell curl -X PUT {JAN_URL}/v1/models/{model_id}/stop ``` -- Example response + +#### Response + ```json { "id": "model-zephyr-7B", @@ -355,18 +358,95 @@ curl -X PUT {JAN_URL}/v1/models/{model_id}/stop "state": "ready" } ``` + ### Download Model > Jan-only endpoint The request to download `model` by changing model state from `to_download` to `downloading` then `ready`once it's done. -- Example request + +#### Request ```shell curl -X POST {JAN_URL}/v1/models/ ``` -- Example response + +#### Response ```json { "id": "model-zephyr-7B", "object": "model", "state": "downloading" } +``` + +## Examples + +### Pre-loaded Models + +- Jan ships with a default model folders containing recommended models +- Only the Model Object `json` files are included +- Users must later explicitly download the model binaries +- +```sh +models/ + mistral-7b/ + mistral-7b.json + hermes-7b/ + hermes-7b.json +``` + +### Azure OpenAI + +- Using a remote API to access model `model-azure-openai-gpt4-turbo.json` +- See [source](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython&pivots=rest-api) + +```json +"source_url": "https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo", +"parameters": { + "init" { + "API-KEY": "", + "DEPLOYMENT-NAME": "", + "api-version": "2023-05-15" + }, + "runtime": { + "temperature": "0.7", + "max_tokens": "2048", + "presence_penalty": "0", + "top_p": "1", + "stream": "true" + } +} +"metadata": { + "engine": "api", +} +``` + +### Multiple quantizations + +- Each quantization has its own `Jan Model Object` file + +```sh +llama2-7b-gguf/ + llama2-7b-gguf-Q2.json + llama2-7b-gguf-Q3_K_L.json + .bin +``` + +### Multiple model partitions + +- A Model that is partitioned into several binaries use just 1 file + +```sh +llava-ggml/ + llava-ggml-Q5.json + .proj + ggml +``` + +### Your locally fine-tuned model + +- ?? + +```sh +llama-70b-finetune/ + llama-70b-finetune-q5.json + .bin ``` \ No newline at end of file diff --git a/docs/docs/specs/prompts.md b/docs/docs/specs/prompts.md new file mode 100644 index 000000000..2ec008d8a --- /dev/null +++ b/docs/docs/specs/prompts.md @@ -0,0 +1,7 @@ +--- +title: Prompts +--- + +- [ ] /prompts folder +- [ ] How to add to prompts +- [ ] Assistants can have suggested Prompts \ No newline at end of file diff --git a/docs/docs/specs/settings.md b/docs/docs/specs/settings.md index 2f5cfc061..a8cf8809c 100644 --- a/docs/docs/specs/settings.md +++ b/docs/docs/specs/settings.md @@ -1,3 +1,5 @@ --- title: Settings ---- \ No newline at end of file +--- + +- [ ] .jan folder in jan root \ No newline at end of file diff --git a/docs/sidebars.js b/docs/sidebars.js index 973d8ccb2..fe04d330a 100644 --- a/docs/sidebars.js +++ b/docs/sidebars.js @@ -68,6 +68,7 @@ const sidebars = { "specs/jan", "specs/fine-tuning", "specs/settings", + "specs/prompts", ], }, ],