12 KiB
| title |
|---|
| Models |
:::warning
Draft Specification: functionality has not been implemented yet.
Feedback: HackMD: Models Spec
:::
Models are AI models like Llama and Mistral
OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models
User Stories
Users can download a model via a web URL
- Wireframes here
Users can import a model from local directory
- Wireframes here
Users can configure model settings, like run parameters
- Wireframes here
Users can override run settings at runtime
- See Assistant Spec and Thread
Jan Model Object
- A
Jan Model Objectis a “representation" of a model - Objects are defined by
model-name.jsonfiles injsonformat - Objects are identified by
folder-name/model-name, where itsidis indicative of its file location. - Objects are designed to be compatible with
OpenAI Model Objects, with additional properties needed to run on our infrastructure. - ALL object properties are optional, i.e. users should be able to run a model declared by an empty
jsonfile.
| Property | Type | Description | Validation |
|---|---|---|---|
source_url |
string | The model download source. It can be an external url or a local filepath. | Defaults to pwd. See Source_url |
object |
enum: model, assistant, thread, message |
Type of the Jan Object. Always model |
Defaults to "model" |
name |
string | A vanity name | Defaults to filename |
description |
string | A vanity description of the model | Defaults to "" |
state |
enum[to_download , downloading, ready , running] |
Needs more thought | Defaults to to_download |
parameters |
map | Defines default model run parameters used by any assistant. | Defaults to {} |
metadata |
map | Stores additional structured information about the model. | Defaults to {} |
metadata.engine |
enum: llamacpp, api, tensorrt |
The model backend used to run model. | Defaults to "llamacpp" |
metadata.quantization |
string | Supported formats only | See Custom importers |
metadata.binaries |
array | Supported formats only. | See Custom importers |
Source_url
- Users can download models from a
remotesource or reference an existinglocalmodel. - If this property is not specified in the Model Object file, then the default behavior is to look in the current directory.
Local source_url
- Users can import a local model by providing the filepath to the model
// ./models/llama2/llama2-7bn-gguf.json
"source_url": "~/Downloads/llama-2-7bn-q5-k-l.gguf",
// Default, if property is omitted
"source_url": "./",
Remote source_url
- Users can download a model by remote URL.
- Supported url formats:
https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/blob/main/llama-2-7b-chat.Q3_K_L.ggufhttps://any-source.com/.../model-binary.bin
Custom importers
Additionally, Jan supports importing popular formats. For example, if you provide a HuggingFace URL for a TheBloke model, Jan automatically downloads and catalogs all quantizations. Custom importers autofills properties like metadata.quantization and metadata.size.
Supported URL formats with custom importers:
huggingface/thebloke: Linkjanhq:TODO: put URL hereazure_openai:https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turboopenai:api.openai.com
Generic Example
- Model has 1 binary
model-zephyr-7B.json - See source
// ./models/zephr/zephyr-7b-beta-Q4_K_M.json
// Note: Default fields omitted for brevity
"source_url": "https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/blob/main/zephyr-7b-beta.Q4_K_M.gguf",
"parameters": {
"init": {
"ctx_len": "2048",
"ngl": "100",
"embedding": "true",
"n_parallel": "4",
"pre_prompt": "A chat between a curious user and an artificial intelligence",
"user_prompt": "USER: ",
"ai_prompt": "ASSISTANT: "
},
"runtime": {
"temperature": "0.7",
"token_limit": "2048",
"top_k": "0",
"top_p": "1",
"stream": "true"
}
},
"metadata": {
"engine": "llamacpp",
"quantization": "Q3_K_L",
"size": "7B",
}
Example: multiple binaries
- Model has multiple binaries
model-llava-1.5-ggml.json - See source
"source_url": "https://huggingface.co/mys/ggml_llava-v1.5-13b",
"parameters": {"init": {}, "runtime": {}}
"metadata": {
"mmproj_binary": "https://huggingface.co/mys/ggml_llava-v1.5-13b/blob/main/mmproj-model-f16.gguf",
"ggml_binary": "https://huggingface.co/mys/ggml_llava-v1.5-13b/blob/main/ggml-model-q5_k.gguf",
"engine": "llamacpp",
"quantization": "Q5_K"
}
Example: Azure API
- Using a remote API to access model
model-azure-openai-gpt4-turbo.json - See source
"source_url": "https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo",
"parameters": {
"init" {
"API-KEY": "",
"DEPLOYMENT-NAME": "",
"api-version": "2023-05-15"
},
"runtime": {
"temperature": "0.7",
"max_tokens": "2048",
"presence_penalty": "0",
"top_p": "1",
"stream": "true"
}
}
"metadata": {
"engine": "api",
}
Filesystem
- Everything needed to represent a
modelis packaged into anModel folder. - The
folderis standalone and can be easily zipped, imported, and exported, e.g. to Github. - The
folderalways contains at least oneModel Object, declared in ajsonformat. - The
folderandfiledo not have to share the same name - The model
idis made up offolder_name/filenameand is thus always unique.
/janroot
/models
azure-openai/ # Folder name
azure-openai-gpt3-5.json # File name
llama2-70b/
model.json
.gguf
Default ./model folder
- Jan ships with a default model folders containing recommended models
- Only the Model Object
jsonfiles are included - Users must later explicitly download the model binaries
models/
mistral-7b/
mistral-7b.json
hermes-7b/
hermes-7b.json
Multiple quantizations
- Each quantization has its own
Jan Model Objectfile
llama2-7b-gguf/
llama2-7b-gguf-Q2.json
llama2-7b-gguf-Q3_K_L.json
.bin
Multiple model partitions
- A Model that is partitioned into several binaries use just 1 file
llava-ggml/
llava-ggml-Q5.json
.proj
ggml
Your locally fine-tuned model
- ??
llama-70b-finetune/
llama-70b-finetune-q5.json
.bin
Jan API
Model API Object
- The
Jan Model Objectmaps into theOpenAI Model Object. - Properties marked with
*are compatible with the OpenAImodelobject - Note: The
Jan Model Objecthas additional properties when retrieved via its API endpoint.
OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/object
Model lifecycle
Model has 4 states (enum)
to_downloaddownloadingreadyrunning
Get Model
OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/retrieve
- Example request
curl {JAN_URL}/v1/models/{model_id}
- Example response
{
"id": "model-zephyr-7B",
"object": "model",
"created_at": 1686935002,
"owned_by": "thebloke",
"state": "running",
"source_url": "https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/blob/main/zephyr-7b-beta.Q4_K_M.gguf",
"parameters": {
"ctx_len": 2048,
"ngl": 100,
"embedding": true,
"n_parallel": 4,
"pre_prompt": "A chat between a curious user and an artificial intelligence",
"user_prompt": "USER: ",
"ai_prompt": "ASSISTANT: ",
"temperature": "0.7",
"token_limit": "2048",
"top_k": "0",
"top_p": "1",
},
"metadata": {
"engine": "llamacpp",
"quantization": "Q3_K_L",
"size": "7B",
}
}
List models
Lists the currently available models, and provides basic information about each one such as the owner and availability.
OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/list
- Example request
curl {JAN_URL}/v1/models
- Example response
{
"object": "list",
"data": [
{
"id": "model-zephyr-7B",
"object": "model",
"created_at": 1686935002,
"owned_by": "thebloke",
"state": "running"
},
{
"id": "ft-llama-70b-gguf",
"object": "model",
"created_at": 1686935002,
"owned_by": "you",
"state": "stopped"
},
{
"id": "model-azure-openai-gpt4-turbo",
"object": "model",
"created_at": 1686935002,
"owned_by": "azure_openai",
"state": "running"
},
],
"object": "list"
}
Delete Model
OpenAI Equivalent: https://platform.openai.com/docs/api-reference/models/delete `- Example request
curl -X DELETE {JAN_URL}/v1/models/{model_id}
- Example response
{
"id": "model-zephyr-7B",
"object": "model",
"deleted": true,
"state": "to_download"
}
Start Model
Jan-only endpoint The request to start
modelby changing model state fromreadytorunning
- Example request
curl -X PUT {JAN_URL}/v1/models{model_id}/start
- Example response
{
"id": "model-zephyr-7B",
"object": "model",
"state": "running"
}
Stop Model
Jan-only endpoint The request to start
modelby changing model state fromrunningtoready
- Example request
curl -X PUT {JAN_URL}/v1/models/{model_id}/stop
- Example response
{
"id": "model-zephyr-7B",
"object": "model",
"state": "ready"
}
Download Model
Jan-only endpoint The request to download
modelby changing model state fromto_downloadtodownloadingthenreadyonce it's done.
- Example request
curl -X POST {JAN_URL}/v1/models/
- Example response
{
"id": "model-zephyr-7B",
"object": "model",
"state": "downloading"
}