feat: Hiro revision Nov 17

2023-11-18 10:02:16 +07:00 · 2023-11-18 10:02:16 +07:00 · bc10225a64
commit bc10225a64
parent 867c2d9b99
1 changed files with 138 additions and 55 deletions
--- a/docs/docs/docs/specs/models.md
+++ b/docs/docs/docs/specs/models.md
@ -71,10 +71,10 @@ Additionally, Jan supports importing popular formats. For example, if you provid
 Supported URL formats with custom importers:
- `huggingface/thebloke`: `TODO: URL here`
+- `huggingface/thebloke`: [Link](https://huggingface.co/TheBloke/Llama-2-7B-GGUF)
 - `janhq`: `TODO: put URL here`
- `azure_openai`: `TODO: put URL here`
+- `azure_openai`: `https://docs-test-001.openai.azure.com/openai.azure.com/docs-test-001/gpt4-turbo`
- `openai`: `TODO: put URL here`
+- `openai`: `api.openai.com`
 ### Generic Example
@ -92,33 +92,38 @@ Supported URL formats with custom importers:
    "n_parallel": 4,
    "pre_prompt": "A chat between a curious user and an artificial intelligence",
    "user_prompt": "USER: ",
-    "ai_prompt": "ASSISTANT: "
+    "ai_prompt": "ASSISTANT: ",
    "temperature": "0.7",
    "token_limit": "2048",
-    "top_k": "..",
+    "top_k": "0",
-    "top_p": "..",
+    "top_p": "1",
 },
 "metadata": {
-    "quantization": "..",
+    "engine": "llamacpp",
-    "size": "..",
+    "quantization": "Q3_K_L",
    "size": "7B",
 }
 ```
 ### Example: multiple binaries
- Model has multiple binaries
+- Model has multiple binaries `model-llava-1.5-ggml.json`
 - See [source](https://huggingface.co/mys/ggml_llava-v1.5-13b)
 ```json
 "source_url": "https://huggingface.co/mys/ggml_llava-v1.5-13b"
 "parameters": {}
 "metadata": {
-    "binaries": "..", // TODO: what should this property be
+    "mmproj_binary": "https://huggingface.co/mys/ggml_llava-v1.5-13b/blob/main/mmproj-model-f16.gguf",
    "ggml_binary": "https://huggingface.co/mys/ggml_llava-v1.5-13b/blob/main/ggml-model-q5_k.gguf",
    "engine": "llamacpp",
    "quantization": "Q5_K",
 }
 ```
 ### Example: Azure API
- Using a remote API to access model
+- Using a remote API to access model `model-azure-openai-gpt4-turbo.json`
 - See [source](https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython&pivots=rest-api)
 ```json
@ -158,11 +163,9 @@ Supported URL formats with custom importers:
 ```
 ### Default ./model folder
 - Jan ships with a default model folders containing recommended models
 - Only the Model Object `json` files are included
 - Users must later explicitly download the model binaries
 ```sh
 models/
    mistral-7b/
@ -170,7 +173,6 @@ models/
    hermes-7b/
        hermes-7b.json
 ```
 ### Multiple quantizations
 - Each quantization has its own `Jan Model Object` file
@ -181,7 +183,6 @@ llama2-7b-gguf/
    llama2-7b-gguf-Q3_K_L.json
    .bin
 ```
 ### Multiple model partitions
 - A Model that is partitioned into several binaries use just 1 file
@ -192,8 +193,7 @@ llava-ggml/
    .proj
    ggml
 ```
-
+### Your locally fine-tuned model
 ### ?? whats this example for?
 - ??
@ -202,11 +202,8 @@ llama-70b-finetune/
    llama-70b-finetune-q5.json
    .bin
 ```
 ## Jan API
 ### Model API Object
 - The `Jan Model Object` maps into the `OpenAI Model Object`.
 - Properties marked with `*` are compatible with the [OpenAI `model` object](https://platform.openai.com/docs/api-reference/models)
 - Note: The `Jan Model Object` has additional properties when retrieved via its API endpoint.
@ -224,45 +221,131 @@ llama-70b-finetune/
 | `parameters`  | map            | Defines default model run parameters used by any assistant. |                                              |
 | `metadata`    | map            | Stores additional structured information about the model.   |                                              |
 ### Model lifecycle
 Model has 4 states (enum)
 - `not_downloaded`
 - `downloaded`
 - `running`
 - `not_running`
 ### List models
-
+Lists the currently available models, and provides basic information about each one such as the owner and availability.
- https://platform.openai.com/docs/api-reference/models/list
+- [OAI Reference](https://platform.openai.com/docs/api-reference/models/list)
-
+- Example request
-TODO: @hiro
+```shell=
-
+curl {JAN_URL}/v1/models
 ```
 - Example response
 ```json=
 {
  "object": "list",
  "data": [
    {
      "id": "model-zephyr-7B",
      "object": "model",
      "created": 1686935002,
      "owned_by": "thebloke",
      "state": "running"
    },
    {
      "id": "ft-llama-70b-gguf",
      "object": "model",
      "created": 1686935002,
      "owned_by": "you",
      "state": "stopped"
    },
    {
      "id": "model-azure-openai-gpt4-turbo",
      "object": "model",
      "created": 1686935002,
      "owned_by": "azure_openai",
      "state": "running"
    },
  ],
  "object": "list"
 }
 ```
 ### Get Model
-
+Retrieves a model instance, providing basic information about the model such as the owner and permissioning.
- https://platform.openai.com/docs/api-reference/models/retrieve
+- [OAI Reference](https://platform.openai.com/docs/api-reference/models/retrieve)
-
+- Example request
-TODO: @hiro
+```shell=
-
+curl {JAN_URL}/v1/models/model-zephyr-7B
 ```
 - Example response
 ```json=
 {
  "id": "model-zephyr-7B",
  "object": "model",
  "created": 1686935002,
  "owned_by": "thebloke",
  "state": "running" # enum[not_downloaded, downloaded, running, stopped],
  "source_url": "https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/blob/main/zephyr-7b-beta.Q4_K_M.gguf",
  "parameters": {
     "ctx_len": 2048,
     "ngl": 100,
     "embedding": true,
     "n_parallel": 4,
     "pre_prompt": "A chat between a curious user and an artificial intelligence",
     "user_prompt": "USER: ",
     "ai_prompt": "ASSISTANT: "
     "temperature": "0.7",
     "token_limit": "2048",
     "top_k": "0",
     "top_p": "1",
  },
  "metadata": {
     "engine": "llamacpp",
     "quantization": "Q3_K_L",
     "size": "7B",
  }
 }
 ```
 ### Delete Model
-
+Delete a tuned model.
- https://platform.openai.com/docs/api-reference/models/delete
+- [OAI Reference](https://platform.openai.com/docs/api-reference/models/delete)
-
+- Example request
-TODO: @hiro
+```shell=
-
+curl -X DELETE {JAN_URL}/v1/models/model-zephyr-7B
-### Get Model State
+```
-
+- Example response
-> Jan-only endpoint
+```json=
-> TODO: @hiro
+{
-
+  "id": "model-zephyr-7B",
-### Get Model Metadata
+  "object": "model",
-
+  "deleted": true
-> Jan-only endpoint
+}
-> TODO: @hiro
+```
 ### Download Model
 > Jan-only endpoint
 > TODO: @hiro
 ### Start Model
 > Jan-only endpoint
-> TODO: @hiro
+The request to start `model` by changing model state from `downloaded` to `running`
-
+- Example request
 ```shell=
 curl -X PUT {JAN_URL}/v1/models/model-zephyr-7B/start
 ```
 - Example response
 ```json=
 {
  "id": "model-zephyr-7B",
  "object": "model",
  "state": "running"
 }
 ```
 ### Stop Model
 > Jan-only endpoint
-> TODO: @hiro
+The request to start `model` by changing model state from `running` to `downloaded`
 - Example request
 ```shell=
 curl -X PUT {JAN_URL}/v1/models/model-zephyr-7B/stop
 ```
 - Example response
 ```json=
 {
  "id": "model-zephyr-7B",
  "object": "model",
  "state": "downloaded"
 }
 ```
 ### Download Model
 > Jan-only endpoint
 > TODO: @hiro