Merge pull request #3473 from janhq/dev

Release Cut 0.5.3
This commit is contained in:
Van Pham 2024-08-27 16:58:55 +07:00 committed by GitHub
commit c0ffd03f61
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
202 changed files with 3208 additions and 8412 deletions

37
.github/ISSUE_TEMPLATE/bug_report.md vendored Normal file
View File

@ -0,0 +1,37 @@
---
name: "🖋️ Report"
about: Create a report to help us improve Jan
title: 'bug: [DESCRIPTION]'
labels: 'type: bug'
assignees: ''
---
**Describe the bug**
A clear and concise description of what the bug is.
**Steps to reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Screenshots**
If applicable, add screenshots to help explain your issue.
**Environment details**
- Operating System: [Specify your OS. e.g., MacOS Sonoma 14.2.1, Windows 11, Ubuntu 22, etc]
- Jan Version: [e.g., 0.4.xxx nightly or manual]
- Processor: [e.g., Apple M1, Intel Core i7, AMD Ryzen 5, etc]
- RAM: [e.g., 8GB, 16GB]
- Any additional relevant hardware specifics: [e.g., Graphics card, SSD/HDD]
**Logs**
If the cause of the error is not clear, kindly provide your usage logs: https://jan.ai/docs/troubleshooting#how-to-get-error-logs
**Additional context**
Add any other context or information that could be helpful in diagnosing the problem.

View File

@ -10,7 +10,7 @@ on:
description: 'Public Provider'
options:
- none
- cloudflare-r2
- aws-s3
default: none
jobs:
@ -28,10 +28,10 @@ jobs:
echo "::set-output name=ref::${{ github.ref }}"
else
if [ "${{ github.event_name }}" == "schedule" ]; then
echo "::set-output name=public_provider::cloudflare-r2"
echo "::set-output name=public_provider::aws-s3"
echo "::set-output name=ref::refs/heads/dev"
elif [ "${{ github.event_name }}" == "push" ]; then
echo "::set-output name=public_provider::cloudflare-r2"
echo "::set-output name=public_provider::aws-s3"
echo "::set-output name=ref::${{ github.ref }}"
else
echo "::set-output name=public_provider::none"
@ -112,13 +112,13 @@ jobs:
cat ./latest-mac.yml
- name: Upload latest-mac.yml
if: ${{ needs.set-public-provider.outputs.public_provider == 'cloudflare-r2' }}
if: ${{ needs.set-public-provider.outputs.public_provider == 'aws-s3' }}
run: |
aws s3api put-object --endpoint-url https://${{ secrets.CLOUDFLARE_ACCOUNT_ID }}.r2.cloudflarestorage.com --bucket ${{ secrets.CLOUDFLARE_R2_BUCKET_NAME }} --key "latest/latest-mac.yml" --body "./latest-mac.yml"
aws s3 cp ./latest-mac.yml "s3://${{ secrets.DELTA_AWS_S3_BUCKET_NAME }}/latest/latest-mac.yml"
env:
AWS_ACCESS_KEY_ID: ${{ secrets.CLOUDFLARE_R2_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.CLOUDFLARE_R2_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: auto
AWS_ACCESS_KEY_ID: ${{ secrets.DELTA_AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.DELTA_AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: ${{ secrets.DELTA_AWS_REGION }}
AWS_EC2_METADATA_DISABLED: "true"
@ -147,7 +147,7 @@ jobs:
noti-discord-manual-and-update-url-readme:
needs: [build-macos-x64, build-macos-arm64, build-windows-x64, build-linux-x64, get-update-version, set-public-provider, combine-latest-mac-yml]
secrets: inherit
if: github.event_name == 'workflow_dispatch' && github.event.inputs.public_provider == 'cloudflare-r2'
if: github.event_name == 'workflow_dispatch' && github.event.inputs.public_provider == 'aws-s3'
uses: ./.github/workflows/template-noti-discord-and-update-url-readme.yml
with:
ref: refs/heads/dev

View File

@ -1,40 +0,0 @@
name: Docker Builder - Nightly / Manual
on:
push:
branches:
- main
- feature/helmchart-and-ci-jan-server
paths-ignore:
- 'README.md'
- 'docs/**'
schedule:
- cron: '0 21 * * 1,2,3' # At 8 PM UTC on Monday, Tuesday, and Wednesday which is 4 AM UTC+7 Tuesday, Wednesday, and Thursday
workflow_dispatch:
jobs:
# Job create Update app version based on latest release tag with build number and save to output
get-update-version:
uses: ./.github/workflows/template-get-update-version.yml
build-cpu:
uses: ./.github/workflows/template-build-jan-server.yml
permissions:
packages: write
secrets: inherit
needs: [get-update-version]
with:
dockerfile_path: ./Dockerfile
docker_image_tag: "ghcr.io/janhq/jan-server:dev-cpu-latest,ghcr.io/janhq/jan-server:dev-cpu-${{ needs.get-update-version.outputs.new_version }}"
build-gpu:
uses: ./.github/workflows/template-build-jan-server.yml
permissions:
packages: write
secrets: inherit
needs: [get-update-version]
with:
dockerfile_path: ./Dockerfile.gpu
docker_image_tag: "ghcr.io/janhq/jan-server:dev-cuda-12.2-latest,ghcr.io/janhq/jan-server:dev-cuda-12.2-${{ needs.get-update-version.outputs.new_version }}"

View File

@ -1,30 +0,0 @@
name: Docker Builder - Tag
on:
push:
tags: ["v[0-9]+.[0-9]+.[0-9]+"]
jobs:
# Job create Update app version based on latest release tag with build number and save to output
get-update-version:
uses: ./.github/workflows/template-get-update-version.yml
build-cpu:
permissions:
packages: write
uses: ./.github/workflows/template-build-jan-server.yml
secrets: inherit
needs: [get-update-version]
with:
dockerfile_path: ./Dockerfile
docker_image_tag: "ghcr.io/janhq/jan-server:cpu-latest,ghcr.io/janhq/jan-server:cpu-${{ needs.get-update-version.outputs.new_version }}"
build-gpu:
permissions:
packages: write
uses: ./.github/workflows/template-build-jan-server.yml
secrets: inherit
needs: [get-update-version]
with:
dockerfile_path: ./Dockerfile.gpu
docker_image_tag: "ghcr.io/janhq/jan-server:cuda-12.2-latest,ghcr.io/janhq/jan-server:cuda-12.2-${{ needs.get-update-version.outputs.new_version }}"

View File

@ -10,23 +10,21 @@ on:
required: true
type: string
default: none
description: 'none: build only, github: build and publish to github, cloudflare: build and publish to cloudflare'
description: 'none: build only, github: build and publish to github, aws s3: build and publish to aws s3'
new_version:
required: true
type: string
default: ''
cloudflare_r2_path:
aws_s3_prefix:
required: false
type: string
default: '/latest/'
secrets:
CLOUDFLARE_R2_BUCKET_NAME:
DELTA_AWS_S3_BUCKET_NAME:
required: false
CLOUDFLARE_R2_ACCESS_KEY_ID:
DELTA_AWS_ACCESS_KEY_ID:
required: false
CLOUDFLARE_R2_SECRET_ACCESS_KEY:
required: false
CLOUDFLARE_ACCOUNT_ID:
DELTA_AWS_SECRET_ACCESS_KEY:
required: false
jobs:
@ -58,7 +56,7 @@ jobs:
mv /tmp/package.json electron/package.json
jq --arg version "${{ inputs.new_version }}" '.version = $version' web/package.json > /tmp/package.json
mv /tmp/package.json web/package.json
jq '.build.publish = [{"provider": "generic", "url": "${{ secrets.CLOUDFLARE_R2_PUBLIC_URL }}", "channel": "latest"}, {"provider": "s3", "bucket": "${{ secrets.CLOUDFLARE_R2_BUCKET_NAME }}", "region": "auto", "endpoint": "https://${{ secrets.CLOUDFLARE_ACCOUNT_ID }}.r2.cloudflarestorage.com", "path": "${{ inputs.cloudflare_r2_path }}", "channel": "latest"}]' electron/package.json > /tmp/package.json
jq '.build.publish = [{"provider": "generic", "url": "${{ secrets.CLOUDFLARE_R2_PUBLIC_URL }}", "channel": "latest"}, {"provider": "s3", "acl": null, "bucket": "${{ secrets.DELTA_AWS_S3_BUCKET_NAME }}", "region": "${{ secrets.DELTA_AWS_REGION}}", "path": "${{ inputs.aws_s3_prefix }}", "channel": "latest"}]' electron/package.json > /tmp/package.json
mv /tmp/package.json electron/package.json
cat electron/package.json
@ -76,7 +74,7 @@ jobs:
env:
VERSION_TAG: ${{ inputs.new_version }}
- name: Build and publish app to cloudflare r2 or github artifactory
- name: Build and publish app to aws s3 r2 or github artifactory
if: inputs.public_provider != 'github'
run: |
# check public_provider is true or not
@ -88,9 +86,10 @@ jobs:
fi
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.CLOUDFLARE_R2_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.CLOUDFLARE_R2_SECRET_ACCESS_KEY }}
AWS_ACCESS_KEY_ID: ${{ secrets.DELTA_AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.DELTA_AWS_SECRET_ACCESS_KEY }}
AWS_EC2_METADATA_DISABLED: "true"
AWS_MAX_ATTEMPTS: "5"
- name: Build and publish app to github
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/') && inputs.public_provider == 'github'

View File

@ -10,23 +10,21 @@ on:
required: true
type: string
default: none
description: 'none: build only, github: build and publish to github, cloudflare: build and publish to cloudflare'
description: 'none: build only, github: build and publish to github, aws s3: build and publish to aws s3'
new_version:
required: true
type: string
default: ''
cloudflare_r2_path:
aws_s3_prefix:
required: false
type: string
default: '/latest/'
secrets:
CLOUDFLARE_R2_BUCKET_NAME:
DELTA_AWS_S3_BUCKET_NAME:
required: false
CLOUDFLARE_R2_ACCESS_KEY_ID:
DELTA_AWS_ACCESS_KEY_ID:
required: false
CLOUDFLARE_R2_SECRET_ACCESS_KEY:
required: false
CLOUDFLARE_ACCOUNT_ID:
DELTA_AWS_SECRET_ACCESS_KEY:
required: false
CODE_SIGN_P12_BASE64:
required: false
@ -70,7 +68,7 @@ jobs:
jq --arg version "${{ inputs.new_version }}" '.version = $version' web/package.json > /tmp/package.json
mv /tmp/package.json web/package.json
jq '.build.publish = [{"provider": "generic", "url": "${{ secrets.CLOUDFLARE_R2_PUBLIC_URL }}", "channel": "latest"}, {"provider": "s3", "bucket": "${{ secrets.CLOUDFLARE_R2_BUCKET_NAME }}", "region": "auto", "endpoint": "https://${{ secrets.CLOUDFLARE_ACCOUNT_ID }}.r2.cloudflarestorage.com", "path": "${{ inputs.cloudflare_r2_path }}", "channel": "latest"}]' electron/package.json > /tmp/package.json
jq '.build.publish = [{"provider": "generic", "url": "${{ secrets.CLOUDFLARE_R2_PUBLIC_URL }}", "channel": "latest"}, {"provider": "s3", "acl": null, "bucket": "${{ secrets.DELTA_AWS_S3_BUCKET_NAME }}", "region": "${{ secrets.DELTA_AWS_REGION}}", "path": "${{ inputs.aws_s3_prefix }}", "channel": "latest"}]' electron/package.json > /tmp/package.json
mv /tmp/package.json electron/package.json
jq --arg teamid "${{ secrets.APPLE_TEAM_ID }}" '.build.mac.notarize.teamId = $teamid' electron/package.json > /tmp/package.json
@ -107,7 +105,7 @@ jobs:
p12-file-base64: ${{ secrets.CODE_SIGN_P12_BASE64 }}
p12-password: ${{ secrets.CODE_SIGN_P12_PASSWORD }}
- name: Build and publish app to cloudflare r2 or github artifactory
- name: Build and publish app to aws s3 r2 or github artifactory
if: inputs.public_provider != 'github'
run: |
# check public_provider is true or not
@ -126,10 +124,11 @@ jobs:
APPLE_APP_SPECIFIC_PASSWORD: ${{ secrets.APPLE_APP_SPECIFIC_PASSWORD }}
APP_PATH: "."
DEVELOPER_ID: ${{ secrets.DEVELOPER_ID }}
AWS_ACCESS_KEY_ID: ${{ secrets.CLOUDFLARE_R2_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.CLOUDFLARE_R2_SECRET_ACCESS_KEY }}
AWS_ACCESS_KEY_ID: ${{ secrets.DELTA_AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.DELTA_AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: auto
AWS_EC2_METADATA_DISABLED: "true"
AWS_MAX_ATTEMPTS: "5"
- name: Build and publish app to github
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/') && inputs.public_provider == 'github'

View File

@ -10,23 +10,21 @@ on:
required: true
type: string
default: none
description: 'none: build only, github: build and publish to github, cloudflare: build and publish to cloudflare'
description: 'none: build only, github: build and publish to github, aws s3: build and publish to aws s3'
new_version:
required: true
type: string
default: ''
cloudflare_r2_path:
aws_s3_prefix:
required: false
type: string
default: '/latest/'
secrets:
CLOUDFLARE_R2_BUCKET_NAME:
DELTA_AWS_S3_BUCKET_NAME:
required: false
CLOUDFLARE_R2_ACCESS_KEY_ID:
DELTA_AWS_ACCESS_KEY_ID:
required: false
CLOUDFLARE_R2_SECRET_ACCESS_KEY:
required: false
CLOUDFLARE_ACCOUNT_ID:
DELTA_AWS_SECRET_ACCESS_KEY:
required: false
CODE_SIGN_P12_BASE64:
required: false
@ -70,7 +68,7 @@ jobs:
jq --arg version "${{ inputs.new_version }}" '.version = $version' web/package.json > /tmp/package.json
mv /tmp/package.json web/package.json
jq '.build.publish = [{"provider": "generic", "url": "${{ secrets.CLOUDFLARE_R2_PUBLIC_URL }}", "channel": "latest"}, {"provider": "s3", "bucket": "${{ secrets.CLOUDFLARE_R2_BUCKET_NAME }}", "region": "auto", "endpoint": "https://${{ secrets.CLOUDFLARE_ACCOUNT_ID }}.r2.cloudflarestorage.com", "path": "${{ inputs.cloudflare_r2_path }}", "channel": "latest"}]' electron/package.json > /tmp/package.json
jq '.build.publish = [{"provider": "generic", "url": "${{ secrets.CLOUDFLARE_R2_PUBLIC_URL }}", "channel": "latest"}, {"provider": "s3", "acl": null, "bucket": "${{ secrets.DELTA_AWS_S3_BUCKET_NAME }}", "region": "${{ secrets.DELTA_AWS_REGION}}", "path": "${{ inputs.aws_s3_prefix }}", "channel": "latest"}]' electron/package.json > /tmp/package.json
mv /tmp/package.json electron/package.json
jq --arg teamid "${{ secrets.APPLE_TEAM_ID }}" '.build.mac.notarize.teamId = $teamid' electron/package.json > /tmp/package.json
@ -107,7 +105,7 @@ jobs:
p12-file-base64: ${{ secrets.CODE_SIGN_P12_BASE64 }}
p12-password: ${{ secrets.CODE_SIGN_P12_PASSWORD }}
- name: Build and publish app to cloudflare r2 or github artifactory
- name: Build and publish app to aws s3 r2 or github artifactory
if: inputs.public_provider != 'github'
run: |
# check public_provider is true or not
@ -126,10 +124,11 @@ jobs:
APPLE_APP_SPECIFIC_PASSWORD: ${{ secrets.APPLE_APP_SPECIFIC_PASSWORD }}
APP_PATH: "."
DEVELOPER_ID: ${{ secrets.DEVELOPER_ID }}
AWS_ACCESS_KEY_ID: ${{ secrets.CLOUDFLARE_R2_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.CLOUDFLARE_R2_SECRET_ACCESS_KEY }}
AWS_ACCESS_KEY_ID: ${{ secrets.DELTA_AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.DELTA_AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: auto
AWS_EC2_METADATA_DISABLED: "true"
AWS_MAX_ATTEMPTS: "5"
- name: Build and publish app to github
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/') && inputs.public_provider == 'github'

View File

@ -10,23 +10,21 @@ on:
required: true
type: string
default: none
description: 'none: build only, github: build and publish to github, cloudflare: build and publish to cloudflare'
description: 'none: build only, github: build and publish to github, aws s3: build and publish to aws s3'
new_version:
required: true
type: string
default: ''
cloudflare_r2_path:
aws_s3_prefix:
required: false
type: string
default: '/latest/'
secrets:
CLOUDFLARE_R2_BUCKET_NAME:
DELTA_AWS_S3_BUCKET_NAME:
required: false
CLOUDFLARE_R2_ACCESS_KEY_ID:
DELTA_AWS_ACCESS_KEY_ID:
required: false
CLOUDFLARE_R2_SECRET_ACCESS_KEY:
required: false
CLOUDFLARE_ACCOUNT_ID:
DELTA_AWS_SECRET_ACCESS_KEY:
required: false
AZURE_KEY_VAULT_URI:
required: false
@ -71,7 +69,7 @@ jobs:
jq --arg version "${{ inputs.new_version }}" '.version = $version' web/package.json > /tmp/package.json
mv /tmp/package.json web/package.json
jq '.build.publish = [{"provider": "generic", "url": "${{ secrets.CLOUDFLARE_R2_PUBLIC_URL }}", "channel": "latest"}, {"provider": "s3", "bucket": "${{ secrets.CLOUDFLARE_R2_BUCKET_NAME }}", "region": "auto", "endpoint": "https://${{ secrets.CLOUDFLARE_ACCOUNT_ID }}.r2.cloudflarestorage.com", "path": "${{ inputs.cloudflare_r2_path }}", "channel": "latest"}]' electron/package.json > /tmp/package.json
jq '.build.publish = [{"provider": "generic", "url": "${{ secrets.CLOUDFLARE_R2_PUBLIC_URL }}", "channel": "latest"}, {"provider": "s3", "acl": null, "bucket": "${{ secrets.DELTA_AWS_S3_BUCKET_NAME }}", "region": "${{ secrets.DELTA_AWS_REGION}}", "path": "${{ inputs.aws_s3_prefix }}", "channel": "latest"}]' electron/package.json > /tmp/package.json
mv /tmp/package.json electron/package.json
jq '.build.win.sign = "./sign.js"' electron/package.json > /tmp/package.json
@ -99,7 +97,7 @@ jobs:
run: |
dotnet tool install --global AzureSignTool
- name: Build and publish app to cloudflare r2 or github artifactory
- name: Build and publish app to aws s3 r2 or github artifactory
shell: bash
if: inputs.public_provider != 'github'
run: |
@ -116,10 +114,11 @@ jobs:
AZURE_TENANT_ID: ${{ secrets.AZURE_TENANT_ID }}
AZURE_CLIENT_SECRET: ${{ secrets.AZURE_CLIENT_SECRET }}
AZURE_CERT_NAME: ${{ secrets.AZURE_CERT_NAME }}
AWS_ACCESS_KEY_ID: ${{ secrets.CLOUDFLARE_R2_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.CLOUDFLARE_R2_SECRET_ACCESS_KEY }}
AWS_ACCESS_KEY_ID: ${{ secrets.DELTA_AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.DELTA_AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: auto
AWS_EC2_METADATA_DISABLED: "true"
AWS_MAX_ATTEMPTS: "5"
- name: Build app and publish app to github
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/') && inputs.public_provider == 'github'

2
.gitignore vendored
View File

@ -39,3 +39,5 @@ extensions/*-extension/bin/vulkaninfo
# Turborepo
.turbo
electron/test-data
electron/test-results

View File

@ -1,4 +1 @@
#!/usr/bin/env sh
. "$(dirname -- "$0")/_/husky.sh"
npx pretty-quick --staged
npm run lint --fix

View File

@ -1,60 +0,0 @@
FROM node:20-bookworm AS base
# 1. Install dependencies only when needed
FROM base AS builder
# Install g++ 11
RUN apt update && apt install -y gcc-11 g++-11 cpp-11 jq xsel && rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Install dependencies based on the preferred package manager
COPY . ./
RUN export NITRO_VERSION=$(cat extensions/inference-nitro-extension/bin/version.txt) && \
jq --arg nitroVersion $NITRO_VERSION '(.scripts."downloadnitro:linux" | gsub("\\${NITRO_VERSION}"; $nitroVersion)) | gsub("\r"; "")' extensions/inference-nitro-extension/package.json > /tmp/newcommand.txt && export NEW_COMMAND=$(sed 's/^"//;s/"$//' /tmp/newcommand.txt) && jq --arg newCommand "$NEW_COMMAND" '.scripts."downloadnitro:linux" = $newCommand' extensions/inference-nitro-extension/package.json > /tmp/package.json && mv /tmp/package.json extensions/inference-nitro-extension/package.json
RUN make install-and-build
# # 2. Rebuild the source code only when needed
FROM base AS runner
# Install g++ 11
RUN apt update && apt install -y gcc-11 g++-11 cpp-11 jq xsel && rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Copy the package.json and yarn.lock of root yarn space to leverage Docker cache
COPY --from=builder /app/package.json ./package.json
COPY --from=builder /app/node_modules ./node_modules/
COPY --from=builder /app/yarn.lock ./yarn.lock
# Copy the package.json, yarn.lock, and build output of server yarn space to leverage Docker cache
COPY --from=builder /app/core ./core/
COPY --from=builder /app/server ./server/
RUN cd core && yarn install && yarn run build
RUN yarn workspace @janhq/server install && yarn workspace @janhq/server build
COPY --from=builder /app/docs/openapi ./docs/openapi/
# Copy pre-install dependencies
COPY --from=builder /app/pre-install ./pre-install/
# Copy the package.json, yarn.lock, and output of web yarn space to leverage Docker cache
COPY --from=builder /app/joi ./joi/
COPY --from=builder /app/web ./web/
RUN yarn workspace @janhq/joi install && yarn workspace @janhq/joi build
RUN yarn workspace @janhq/web install
RUN npm install -g serve@latest
EXPOSE 1337 3000 3928
ENV JAN_API_HOST 0.0.0.0
ENV JAN_API_PORT 1337
ENV API_BASE_URL http://localhost:1337
CMD ["sh", "-c", "export NODE_ENV=production && yarn workspace @janhq/web build && cd web && npx serve out & cd server && node build/main.js"]
# docker build -t jan .
# docker run -p 1337:1337 -p 3000:3000 -p 3928:3928 jan

View File

@ -1,87 +0,0 @@
# Please change the base image to the appropriate CUDA version base on NVIDIA Driver Compatibility
# Run nvidia-smi to check the CUDA version and the corresponding driver version
# Then update the base image to the appropriate CUDA version refer https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda/tags
FROM nvidia/cuda:12.2.0-runtime-ubuntu22.04 AS base
# 1. Install dependencies only when needed
FROM base AS builder
# Install g++ 11
RUN apt update && apt install -y gcc-11 g++-11 cpp-11 jq xsel curl gnupg make python3-dev && curl -sL https://deb.nodesource.com/setup_20.x | bash - && apt install nodejs -y && rm -rf /var/lib/apt/lists/*
# Update alternatives for GCC and related tools
RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-11 110 \
--slave /usr/bin/g++ g++ /usr/bin/g++-11 \
--slave /usr/bin/gcov gcov /usr/bin/gcov-11 \
--slave /usr/bin/gcc-ar gcc-ar /usr/bin/gcc-ar-11 \
--slave /usr/bin/gcc-ranlib gcc-ranlib /usr/bin/gcc-ranlib-11 && \
update-alternatives --install /usr/bin/cpp cpp /usr/bin/cpp-11 110
RUN npm install -g yarn
WORKDIR /app
# Install dependencies based on the preferred package manager
COPY . ./
RUN export NITRO_VERSION=$(cat extensions/inference-nitro-extension/bin/version.txt) && \
jq --arg nitroVersion $NITRO_VERSION '(.scripts."downloadnitro:linux" | gsub("\\${NITRO_VERSION}"; $nitroVersion)) | gsub("\r"; "")' extensions/inference-nitro-extension/package.json > /tmp/newcommand.txt && export NEW_COMMAND=$(sed 's/^"//;s/"$//' /tmp/newcommand.txt) && jq --arg newCommand "$NEW_COMMAND" '.scripts."downloadnitro:linux" = $newCommand' extensions/inference-nitro-extension/package.json > /tmp/package.json && mv /tmp/package.json extensions/inference-nitro-extension/package.json
RUN make install-and-build
# # 2. Rebuild the source code only when needed
FROM base AS runner
# Install g++ 11
RUN apt update && apt install -y gcc-11 g++-11 cpp-11 jq xsel curl gnupg make python3-dev && curl -sL https://deb.nodesource.com/setup_20.x | bash - && apt-get install nodejs -y && rm -rf /var/lib/apt/lists/*
# Update alternatives for GCC and related tools
RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-11 110 \
--slave /usr/bin/g++ g++ /usr/bin/g++-11 \
--slave /usr/bin/gcov gcov /usr/bin/gcov-11 \
--slave /usr/bin/gcc-ar gcc-ar /usr/bin/gcc-ar-11 \
--slave /usr/bin/gcc-ranlib gcc-ranlib /usr/bin/gcc-ranlib-11 && \
update-alternatives --install /usr/bin/cpp cpp /usr/bin/cpp-11 110
RUN npm install -g yarn
WORKDIR /app
# Copy the package.json and yarn.lock of root yarn space to leverage Docker cache
COPY --from=builder /app/package.json ./package.json
COPY --from=builder /app/node_modules ./node_modules/
COPY --from=builder /app/yarn.lock ./yarn.lock
# Copy the package.json, yarn.lock, and build output of server yarn space to leverage Docker cache
COPY --from=builder /app/core ./core/
COPY --from=builder /app/server ./server/
RUN cd core && yarn install && yarn run build
RUN yarn workspace @janhq/server install && yarn workspace @janhq/server build
COPY --from=builder /app/docs/openapi ./docs/openapi/
# Copy pre-install dependencies
COPY --from=builder /app/pre-install ./pre-install/
# Copy the package.json, yarn.lock, and output of web yarn space to leverage Docker cache
COPY --from=builder /app/joi ./joi/
COPY --from=builder /app/web ./web/
RUN yarn workspace @janhq/joi install && yarn workspace @janhq/joi build
RUN yarn workspace @janhq/web install
RUN npm install -g serve@latest
EXPOSE 1337 3000 3928
ENV LD_LIBRARY_PATH=/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/cuda-12.0/compat${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
ENV JAN_API_HOST 0.0.0.0
ENV JAN_API_PORT 1337
ENV API_BASE_URL http://localhost:1337
CMD ["sh", "-c", "export NODE_ENV=production && yarn workspace @janhq/web build && cd web && npx serve out & cd server && node build/main.js"]
# pre-requisites: nvidia-docker
# docker build -t jan-gpu . -f Dockerfile.gpu
# docker run -p 1337:1337 -p 3000:3000 -p 3928:3928 --gpus all jan-gpu

View File

@ -1,6 +0,0 @@
dependencies:
- name: common
repository: oci://ghcr.io/janhq/charts
version: 0.1.2
digest: sha256:35e98bde174130787755b0f8ea2359b7b6790d965a7157c2f7cabf1bc8c04471
generated: "2024-02-20T16:20:37.6530108+07:00"

View File

@ -1,10 +0,0 @@
apiVersion: v2
name: jan-server
description: A Helm chart for Kubernetes
type: application
version: 0.1.0
appVersion: '1.0.0'
dependencies:
- name: common
version: 0.1.2 # common-chart-version
repository: oci://ghcr.io/janhq/charts

View File

@ -1,4 +0,0 @@
{
"image-list": "server=ghcr.io/janhq/jan-server",
"platforms": "linux/amd64"
}

View File

@ -1,256 +0,0 @@
common:
imageTag: v0.4.6-cpu
# DO NOT CHANGE THE LINE ABOVE. MAKE ALL CHANGES BELOW
# Global pvc for all workload
pvc:
enabled: false
name: 'janroot'
accessModes: 'ReadWriteOnce'
storageClassName: ''
capacity: '50Gi'
# Global image pull secret
imagePullSecrets: []
externalSecret:
create: false
name: ''
annotations: {}
nameOverride: 'jan-server'
fullnameOverride: 'jan-server'
serviceAccount:
create: true
annotations: {}
name: 'jan-server-service-account'
podDisruptionBudget:
create: false
minAvailable: 1
workloads:
- name: server
image:
repository: ghcr.io/janhq/jan-server
pullPolicy: Always
command: ['/bin/sh', '-c']
args: ['cd server && node build/main.js']
replicaCount: 1
ports:
containerPort: 1337
strategy:
canary:
steps:
- setWeight: 50
- pause: { duration: 1m }
ingress:
enabled: true
className: 'nginx'
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: '100m'
nginx.ingress.kubernetes.io/proxy-read-timeout: '1800'
nginx.ingress.kubernetes.io/proxy-send-timeout: '1800'
# cert-manager.io/cluster-issuer: 'jan-ai-dns01-cluster-issuer'
# nginx.ingress.kubernetes.io/force-ssl-redirect: 'true'
nginx.ingress.kubernetes.io/backend-protocol: HTTP
hosts:
- host: server.local
paths:
- path: /
pathType: Prefix
tls:
[]
# - hosts:
# - server-dev.jan.ai
# secretName: jan-server-prod-tls-v2
instrumentation:
enabled: false
podAnnotations: {}
podSecurityContext: {}
securityContext: {}
service:
externalLabel: {}
type: ClusterIP
port: 1337
targetPort: 1337
# If you want to use GPU, please uncomment the following lines and change imageTag to the one with GPU support
resources:
# limits:
# nvidia.com/gpu: 1
requests:
cpu: 2000m
memory: 8192M
# If you want to use pv, please uncomment the following lines and enable pvc.enabled
volumes:
[]
# - name: janroot
# persistentVolumeClaim:
# claimName: janroot
volumeMounts:
[]
# - name: janroot
# mountPath: /app/server/build/jan
# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, S3_BUCKET_NAME, AWS_ENDPOINT, AWS_REGION should mount as a secret env instead of plain text here
# Change API_BASE_URL to your server's public domain
env:
- name: API_BASE_URL
value: 'http://server.local'
lifecycle: {}
autoscaling:
enabled: false
minReplicas: 2
maxReplicas: 3
targetCPUUtilizationPercentage: 95
targetMemoryUtilizationPercentage: 95
kedaScaling:
enabled: false # ignore if autoscaling.enable = true
cooldownPeriod: 30
pollingInterval: 2
minReplicas: 1
maxReplicas: 5
metricName: celery_queue_length
query: celery_queue_length{queue_name="myqueue"} # change queue_name here
serverAddress: http://prometheus-prod-kube-prome-prometheus.monitoring.svc:9090
threshold: '3'
nodeSelector: {}
tolerations: []
podSecurityGroup:
enabled: false
securitygroupid: []
# Reloader Option
reloader: 'false'
vpa:
enabled: false
- name: web
image:
repository: ghcr.io/janhq/jan-server
pullPolicy: Always
command: ['/bin/sh', '-c']
args:
[
'export NODE_ENV=production && yarn workspace @janhq/web build && cd web && npx serve out',
]
replicaCount: 1
ports:
containerPort: 3000
strategy:
canary:
steps:
- setWeight: 50
- pause: { duration: 1m }
ingress:
enabled: true
className: 'nginx'
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: '100m'
nginx.ingress.kubernetes.io/proxy-read-timeout: '1800'
nginx.ingress.kubernetes.io/proxy-send-timeout: '1800'
# cert-manager.io/cluster-issuer: 'jan-ai-dns01-cluster-issuer'
# nginx.ingress.kubernetes.io/force-ssl-redirect: 'true'
nginx.ingress.kubernetes.io/backend-protocol: HTTP
hosts:
- host: web.local
paths:
- path: /
pathType: Prefix
tls:
[]
# - hosts:
# - server-dev.jan.ai
# secretName: jan-server-prod-tls-v2
instrumentation:
enabled: false
podAnnotations: {}
podSecurityContext: {}
securityContext: {}
service:
externalLabel: {}
type: ClusterIP
port: 3000
targetPort: 3000
resources:
limits:
cpu: 1000m
memory: 2048M
requests:
cpu: 50m
memory: 500M
volumes:
[]
# - name: janroot
# persistentVolumeClaim:
# claimName: janroot
volumeMounts:
[]
# - name: janroot
# mountPath: /app/server/build/jan
# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, S3_BUCKET_NAME, AWS_ENDPOINT, AWS_REGION should mount as a secret env instead of plain text here
# Change API_BASE_URL to your server's public domain
env:
- name: API_BASE_URL
value: 'http://server.local'
lifecycle: {}
autoscaling:
enabled: true
minReplicas: 1
maxReplicas: 3
targetCPUUtilizationPercentage: 95
targetMemoryUtilizationPercentage: 95
kedaScaling:
enabled: false # ignore if autoscaling.enable = true
cooldownPeriod: 30
pollingInterval: 2
minReplicas: 1
maxReplicas: 5
metricName: celery_queue_length
query: celery_queue_length{queue_name="myqueue"} # change queue_name here
serverAddress: http://prometheus-prod-kube-prome-prometheus.monitoring.svc:9090
threshold: '3'
nodeSelector: {}
tolerations: []
podSecurityGroup:
enabled: false
securitygroupid: []
# Reloader Option
reloader: 'false'
vpa:
enabled: false

View File

@ -118,10 +118,21 @@ export abstract class BaseExtension implements ExtensionType {
setting.extensionName = this.name
})
try {
await fs.mkdir(extensionSettingFolderPath)
if (!(await fs.existsSync(extensionSettingFolderPath)))
await fs.mkdir(extensionSettingFolderPath)
const settingFilePath = await joinPath([extensionSettingFolderPath, this.settingFileName])
if (await fs.existsSync(settingFilePath)) return
// Persists new settings
if (await fs.existsSync(settingFilePath)) {
const oldSettings = JSON.parse(await fs.readFileSync(settingFilePath, 'utf-8'))
settings.forEach((setting) => {
// Keep setting value
if (setting.controllerProps && Array.isArray(oldSettings))
setting.controllerProps.value = oldSettings.find(
(e: any) => e.key === setting.key
)?.controllerProps?.value
})
}
await fs.writeFileSync(settingFilePath, JSON.stringify(settings, null, 2))
} catch (err) {
console.error(err)
@ -168,6 +179,7 @@ export abstract class BaseExtension implements ExtensionType {
])
try {
if (!(await fs.existsSync(settingPath))) return []
const content = await fs.readFileSync(settingPath, 'utf-8')
const settings: SettingComponentProps[] = JSON.parse(content)
return settings

View File

@ -89,6 +89,7 @@ export abstract class OAIEngine extends AIEngine {
model: model.id,
stream: true,
...model.parameters,
...(this.provider === 'nitro' ? { engine: 'cortex.llamacpp'} : {}),
}
if (this.transformPayload) {
requestBody = this.transformPayload(requestBody)

View File

@ -58,6 +58,15 @@ const appendFileSync = (...args: any[]) => globalThis.core.api?.appendFileSync(.
const copyFile: (src: string, dest: string) => Promise<void> = (src, dest) =>
globalThis.core.api?.copyFile(src, dest)
/**
* Gets the list of gguf files in a directory
*
* @param path - The paths to the file.
* @returns {Promise<{any}>} - A promise that resolves with the list of gguf and non-gguf files
*/
const getGgufFiles: (paths: string[]) => Promise<any> = (
paths) => globalThis.core.api?.getGgufFiles(paths)
/**
* Gets the file's stats.
*
@ -84,4 +93,5 @@ export const fs = {
copyFile,
fileStat,
writeBlob,
getGgufFiles,
}

View File

@ -77,8 +77,8 @@ export class App implements Processor {
port: args?.port,
isCorsEnabled: args?.isCorsEnabled,
isVerboseEnabled: args?.isVerboseEnabled,
schemaPath: join(await appResourcePath(), 'docs', 'openapi', 'jan.yaml'),
baseDir: join(await appResourcePath(), 'docs', 'openapi'),
schemaPath: join(appResourcePath(), 'docs', 'openapi', 'jan.yaml'),
baseDir: join(appResourcePath(), 'docs', 'openapi'),
prefix: args?.prefix,
})
}

View File

@ -42,7 +42,7 @@ export class Extension implements Processor {
* @returns An array of paths to the base extensions.
*/
async baseExtensions() {
const baseExtensionPath = join(await appResourcePath(), 'pre-install')
const baseExtensionPath = join(appResourcePath(), 'pre-install')
return readdirSync(baseExtensionPath)
.filter((file) => extname(file) === '.tgz')
.map((file) => join(baseExtensionPath, file))

View File

@ -1,7 +1,7 @@
import { join } from 'path'
import fs from 'fs'
import { basename, join } from 'path'
import fs, { readdirSync } from 'fs'
import { appResourcePath, normalizeFilePath, validatePath } from '../../helper/path'
import { getJanDataFolderPath, getJanDataFolderPath as getPath } from '../../helper'
import { defaultAppConfig, getJanDataFolderPath, getJanDataFolderPath as getPath } from '../../helper'
import { Processor } from './Processor'
import { FileStat } from '../../../types'
@ -28,9 +28,10 @@ export class FSExt implements Processor {
return appResourcePath()
}
// Handles the 'getUserHomePath' IPC event. This event is triggered to get the user home path.
// Handles the 'getUserHomePath' IPC event. This event is triggered to get the user app data path.
// CAUTION: This would not return OS home path but the app data path.
getUserHomePath() {
return process.env[process.platform == 'win32' ? 'USERPROFILE' : 'HOME']
return defaultAppConfig().data_folder
}
// handle fs is directory here
@ -79,4 +80,53 @@ export class FSExt implements Processor {
})
})
}
async getGgufFiles(paths: string[]) {
const sanitizedFilePaths: {
path: string
name: string
size: number
}[] = []
for (const filePath of paths) {
const normalizedPath = normalizeFilePath(filePath)
const isExist = fs.existsSync(normalizedPath)
if (!isExist) continue
const fileStats = fs.statSync(normalizedPath)
if (!fileStats) continue
if (!fileStats.isDirectory()) {
const fileName = await basename(normalizedPath)
sanitizedFilePaths.push({
path: normalizedPath,
name: fileName,
size: fileStats.size,
})
} else {
// allowing only one level of directory
const files = await readdirSync(normalizedPath)
for (const file of files) {
const fullPath = await join(normalizedPath, file)
const fileStats = await fs.statSync(fullPath)
if (!fileStats || fileStats.isDirectory()) continue
sanitizedFilePaths.push({
path: fullPath,
name: file,
size: fileStats.size,
})
}
}
}
const unsupportedFiles = sanitizedFilePaths.filter(
(file) => !file.path.endsWith('.gguf')
)
const supportedFiles = sanitizedFilePaths.filter((file) =>
file.path.endsWith('.gguf')
)
return {
unsupportedFiles,
supportedFiles,
}
}
}

View File

@ -1,16 +1,16 @@
import { HttpServer } from '../HttpServer'
import { commonRouter } from './common'
import { downloadRouter } from './app/download'
import { handleRequests } from './app/handlers'
export const v1Router = async (app: HttpServer) => {
// MARK: Public API Routes
app.register(commonRouter)
// MARK: Internal Application Routes
handleRequests(app)
// DEPRECATED: Vulnerability possible issues
// handleRequests(app)
// Expanded route for tracking download progress
// TODO: Replace by Observer Wrapper (ZeroMQ / Vanilla Websocket)
app.register(downloadRouter)
// DEPRECATED: Jan FE Docker deploy is deprecated
// app.register(downloadRouter)
}

View File

@ -1,25 +1,18 @@
import { AppConfiguration, SettingComponentProps } from '../../types'
import { join } from 'path'
import { join, resolve } from 'path'
import fs from 'fs'
import os from 'os'
import childProcess from 'child_process'
const configurationFileName = 'settings.json'
// TODO: do no specify app name in framework module
// TODO: do not default the os.homedir
const defaultJanDataFolder = join(os?.homedir() || '', 'jan')
const defaultAppConfig: AppConfiguration = {
data_folder: defaultJanDataFolder,
quick_ask: false,
}
/**
* Getting App Configurations.
*
* @returns {AppConfiguration} The app configurations.
*/
export const getAppConfigurations = (): AppConfiguration => {
const appDefaultConfiguration = defaultAppConfig()
if (process.env.CI === 'e2e') return appDefaultConfiguration
// Retrieve Application Support folder path
// Fallback to user home directory if not found
const configurationFile = getConfigurationFilePath()
@ -27,8 +20,8 @@ export const getAppConfigurations = (): AppConfiguration => {
if (!fs.existsSync(configurationFile)) {
// create default app config if we don't have one
console.debug(`App config not found, creating default config at ${configurationFile}`)
fs.writeFileSync(configurationFile, JSON.stringify(defaultAppConfig))
return defaultAppConfig
fs.writeFileSync(configurationFile, JSON.stringify(appDefaultConfiguration))
return appDefaultConfiguration
}
try {
@ -38,7 +31,7 @@ export const getAppConfigurations = (): AppConfiguration => {
return appConfigurations
} catch (err) {
console.error(`Failed to read app config, return default config instead! Err: ${err}`)
return defaultAppConfig
return defaultAppConfig()
}
}
@ -155,3 +148,22 @@ export const getEngineConfiguration = async (engineId: string) => {
full_url: fullUrl,
}
}
/**
* Default app configurations
* App Data Folder default to Electron's userData
* %APPDATA% on Windows
* $XDG_CONFIG_HOME or ~/.config on Linux
* ~/Library/Application Support on macOS
*/
export const defaultAppConfig = (): AppConfiguration => {
const { app } = require('electron')
const defaultJanDataFolder = join(app?.getPath('userData') ?? os?.homedir() ?? '', 'data')
return {
data_folder:
process.env.CI === 'e2e'
? (process.env.APP_CONFIG_PATH ?? resolve('./test-data'))
: defaultJanDataFolder,
quick_ask: false,
}
}

View File

@ -11,34 +11,41 @@ export function normalizeFilePath(path: string): string {
return path.replace(/^(file:[\\/]+)([^:\s]+)$/, '$2')
}
export async function appResourcePath(): Promise<string> {
let electron: any = undefined
/**
* App resources path
* Returns string - The current application directory.
*/
export function appResourcePath() {
try {
const moduleName = 'electron'
electron = await import(moduleName)
const electron = require('electron')
// electron
if (electron && electron.protocol) {
let appPath = join(electron.app.getAppPath(), '..', 'app.asar.unpacked')
if (!electron.app.isPackaged) {
// for development mode
appPath = join(electron.app.getAppPath())
}
return appPath
}
} catch (err) {
console.error('Electron is not available')
}
// electron
if (electron && electron.protocol) {
let appPath = join(electron.app.getAppPath(), '..', 'app.asar.unpacked')
if (!electron.app.isPackaged) {
// for development mode
appPath = join(electron.app.getAppPath())
}
return appPath
}
// server
return join(global.core.appPath(), '../../..')
}
export function validatePath(path: string) {
const janDataFolderPath = getJanDataFolderPath()
const appDataFolderPath = getJanDataFolderPath()
const resourcePath = appResourcePath()
const applicationSupportPath = global.core?.appPath() ?? resourcePath
const absolutePath = resolve(__dirname, path)
if (!absolutePath.startsWith(janDataFolderPath)) {
if (
![appDataFolderPath, resourcePath, applicationSupportPath].some((whiteListedPath) =>
absolutePath.startsWith(whiteListedPath)
)
) {
throw new Error(`Invalid path: ${absolutePath}`)
}
}

View File

@ -105,6 +105,7 @@ export enum FileManagerRoute {
getUserHomePath = 'getUserHomePath',
fileStat = 'fileStat',
writeBlob = 'writeBlob',
getGgufFiles = 'getGgufFiles',
}
export type ApiFunction = (...args: any[]) => any

View File

@ -25,6 +25,10 @@ export enum InferenceEngine {
triton_trtllm = 'triton_trtllm',
nitro_tensorrt_llm = 'nitro-tensorrt-llm',
cohere = 'cohere',
nvidia = 'nvidia',
cortex_llamacpp = 'cortex.llamacpp',
cortex_onnx = 'cortex.onnx',
cortex_tensorrtllm = 'cortex.tensorrt-llm',
}
export type ModelArtifact = {
@ -103,6 +107,9 @@ export type ModelMetadata = {
tags: string[]
size: number
cover?: string
// These settings to preserve model settings across threads
default_ctx_len?: number
default_max_tokens?: number
}
/**

View File

@ -1,171 +0,0 @@
# Docker Compose file for setting up Minio, createbuckets, app_cpu, and app_gpu services
version: '3.7'
services:
# Minio service for object storage
minio:
image: minio/minio
volumes:
- minio_data:/data
ports:
- '9000:9000'
- '9001:9001'
environment:
# Set the root user and password for Minio
MINIO_ROOT_USER: minioadmin # This acts as AWS_ACCESS_KEY
MINIO_ROOT_PASSWORD: minioadmin # This acts as AWS_SECRET_ACCESS_KEY
command: server --console-address ":9001" /data
restart: always
healthcheck:
test: ['CMD', 'curl', '-f', 'http://localhost:9000/minio/health/live']
interval: 30s
timeout: 20s
retries: 3
networks:
vpcbr:
ipv4_address: 10.5.0.2
# createbuckets service to create a bucket and set its policy
createbuckets:
image: minio/mc
depends_on:
- minio
entrypoint: >
/bin/sh -c "
/usr/bin/mc alias set myminio http://minio:9000 minioadmin minioadmin;
/usr/bin/mc mb myminio/mybucket;
/usr/bin/mc policy set public myminio/mybucket;
exit 0;
"
networks:
vpcbr:
# app_cpu service for running the CPU version of the application
app_cpu_s3fs:
image: jan:latest
volumes:
- app_data_cpu_s3fs:/app/server/build/jan
build:
context: .
dockerfile: Dockerfile
environment:
# Set the AWS access key, secret access key, bucket name, endpoint, and region for app_cpu
AWS_ACCESS_KEY_ID: minioadmin
AWS_SECRET_ACCESS_KEY: minioadmin
S3_BUCKET_NAME: mybucket
AWS_ENDPOINT: http://10.5.0.2:9000
AWS_REGION: us-east-1
API_BASE_URL: http://localhost:1337
restart: always
profiles:
- cpu-s3fs
ports:
- '3000:3000'
- '1337:1337'
- '3928:3928'
networks:
vpcbr:
ipv4_address: 10.5.0.3
# app_gpu service for running the GPU version of the application
app_gpu_s3fs:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
image: jan-gpu:latest
volumes:
- app_data_gpu_s3fs:/app/server/build/jan
build:
context: .
dockerfile: Dockerfile.gpu
restart: always
environment:
# Set the AWS access key, secret access key, bucket name, endpoint, and region for app_gpu
AWS_ACCESS_KEY_ID: minioadmin
AWS_SECRET_ACCESS_KEY: minioadmin
S3_BUCKET_NAME: mybucket
AWS_ENDPOINT: http://10.5.0.2:9000
AWS_REGION: us-east-1
API_BASE_URL: http://localhost:1337
profiles:
- gpu-s3fs
ports:
- '3000:3000'
- '1337:1337'
- '3928:3928'
networks:
vpcbr:
ipv4_address: 10.5.0.4
app_cpu_fs:
image: jan:latest
volumes:
- app_data_cpu_fs:/app/server/build/jan
build:
context: .
dockerfile: Dockerfile
environment:
API_BASE_URL: http://localhost:1337
restart: always
profiles:
- cpu-fs
ports:
- '3000:3000'
- '1337:1337'
- '3928:3928'
networks:
vpcbr:
ipv4_address: 10.5.0.5
# app_gpu service for running the GPU version of the application
app_gpu_fs:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
image: jan-gpu:latest
volumes:
- app_data_gpu_fs:/app/server/build/jan
build:
context: .
dockerfile: Dockerfile.gpu
restart: always
environment:
API_BASE_URL: http://localhost:1337
profiles:
- gpu-fs
ports:
- '3000:3000'
- '1337:1337'
- '3928:3928'
networks:
vpcbr:
ipv4_address: 10.5.0.6
volumes:
minio_data:
app_data_cpu_s3fs:
app_data_gpu_s3fs:
app_data_cpu_fs:
app_data_gpu_fs:
networks:
vpcbr:
driver: bridge
ipam:
config:
- subnet: 10.5.0.0/16
gateway: 10.5.0.1
# Usage:
# - Run 'docker compose -f docker-compose-dev.yml --profile cpu-s3fs up -d' to start the app_cpu service
# - Run 'docker compose -f docker-compose-dev.yml --profile gpu-s3fs up -d' to start the app_gpu service
# - Run 'docker compose -f docker-compose-dev.yml --profile cpu-fs up -d' to start the app_cpu service
# - Run 'docker compose -f docker-compose-dev.yml --profile gpu-fs up -d' to start the app_gpu service

View File

@ -1,159 +0,0 @@
# Docker Compose file for setting up Minio, createbuckets, app_cpu, and app_gpu services
version: '3.7'
services:
# Minio service for object storage
minio:
image: minio/minio
volumes:
- minio_data:/data
ports:
- '9000:9000'
- '9001:9001'
environment:
# Set the root user and password for Minio
MINIO_ROOT_USER: minioadmin # This acts as AWS_ACCESS_KEY
MINIO_ROOT_PASSWORD: minioadmin # This acts as AWS_SECRET_ACCESS_KEY
command: server --console-address ":9001" /data
restart: always
healthcheck:
test: ['CMD', 'curl', '-f', 'http://localhost:9000/minio/health/live']
interval: 30s
timeout: 20s
retries: 3
networks:
vpcbr:
ipv4_address: 10.5.0.2
# createbuckets service to create a bucket and set its policy
createbuckets:
image: minio/mc
depends_on:
- minio
entrypoint: >
/bin/sh -c "
/usr/bin/mc alias set myminio http://minio:9000 minioadmin minioadmin;
/usr/bin/mc mb myminio/mybucket;
/usr/bin/mc policy set public myminio/mybucket;
exit 0;
"
networks:
vpcbr:
# app_cpu service for running the CPU version of the application
app_cpu_s3fs:
volumes:
- app_data_cpu_s3fs:/app/server/build/jan
image: ghcr.io/janhq/jan-server:dev-cpu-latest
environment:
# Set the AWS access key, secret access key, bucket name, endpoint, and region for app_cpu
AWS_ACCESS_KEY_ID: minioadmin
AWS_SECRET_ACCESS_KEY: minioadmin
S3_BUCKET_NAME: mybucket
AWS_ENDPOINT: http://10.5.0.2:9000
AWS_REGION: us-east-1
API_BASE_URL: http://localhost:1337
restart: always
profiles:
- cpu-s3fs
ports:
- '3000:3000'
- '1337:1337'
- '3928:3928'
networks:
vpcbr:
ipv4_address: 10.5.0.3
# app_gpu service for running the GPU version of the application
app_gpu_s3fs:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
image: ghcr.io/janhq/jan-server:dev-cuda-12.2-latest
volumes:
- app_data_gpu_s3fs:/app/server/build/jan
restart: always
environment:
# Set the AWS access key, secret access key, bucket name, endpoint, and region for app_gpu
AWS_ACCESS_KEY_ID: minioadmin
AWS_SECRET_ACCESS_KEY: minioadmin
S3_BUCKET_NAME: mybucket
AWS_ENDPOINT: http://10.5.0.2:9000
AWS_REGION: us-east-1
API_BASE_URL: http://localhost:1337
profiles:
- gpu-s3fs
ports:
- '3000:3000'
- '1337:1337'
- '3928:3928'
networks:
vpcbr:
ipv4_address: 10.5.0.4
app_cpu_fs:
image: ghcr.io/janhq/jan-server:dev-cpu-latest
volumes:
- app_data_cpu_fs:/app/server/build/jan
environment:
API_BASE_URL: http://localhost:1337
restart: always
profiles:
- cpu-fs
ports:
- '3000:3000'
- '1337:1337'
- '3928:3928'
networks:
vpcbr:
ipv4_address: 10.5.0.5
# app_gpu service for running the GPU version of the application
app_gpu_fs:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
image: ghcr.io/janhq/jan-server:dev-cuda-12.2-latest
volumes:
- app_data_gpu_fs:/app/server/build/jan
restart: always
environment:
API_BASE_URL: http://localhost:1337
profiles:
- gpu-fs
ports:
- '3000:3000'
- '1337:1337'
- '3928:3928'
networks:
vpcbr:
ipv4_address: 10.5.0.6
volumes:
minio_data:
app_data_cpu_s3fs:
app_data_gpu_s3fs:
app_data_cpu_fs:
app_data_gpu_fs:
networks:
vpcbr:
driver: bridge
ipam:
config:
- subnet: 10.5.0.0/16
gateway: 10.5.0.1
# Usage:
# - Run 'docker compose --profile cpu-s3fs up -d' to start the app_cpu service
# - Run 'docker compose --profile gpu-s3fs up -d' to start the app_gpu service
# - Run 'docker compose --profile cpu-fs up -d' to start the app_cpu service
# - Run 'docker compose --profile gpu-fs up -d' to start the app_gpu service

View File

@ -1,8 +1,10 @@
const DEFAULT_MIN_WIDTH = 400
const DEFAULT_MIN_HEIGHT = 600
export const mainWindowConfig: Electron.BrowserWindowConstructorOptions = {
skipTaskbar: false,
minWidth: DEFAULT_MIN_WIDTH,
minHeight: DEFAULT_MIN_HEIGHT,
show: true,
transparent: true,
frame: false,

View File

@ -12,9 +12,9 @@ import {
} from 'fs'
import Store from 'electron-store'
import {
getJanExtensionsPath,
getJanDataFolderPath,
appResourcePath,
getJanExtensionsPath,
} from '@janhq/core/node'
/**
@ -28,8 +28,9 @@ export async function migrate() {
if (store.get('migrated_version') !== app.getVersion()) {
console.debug('start migration:', store.get('migrated_version'))
// if (existsSync(getJanExtensionsPath()))
// rmdirSync(getJanExtensionsPath(), { recursive: true })
if (existsSync(getJanExtensionsPath()))
rmdirSync(getJanExtensionsPath(), { recursive: true })
await migrateThemes()
store.set('migrated_version', app.getVersion())
@ -43,9 +44,9 @@ async function migrateThemes() {
if (!existsSync(join(getJanDataFolderPath(), 'themes')))
mkdirSync(join(getJanDataFolderPath(), 'themes'), { recursive: true })
const themes = readdirSync(join(await appResourcePath(), 'themes'))
const themes = readdirSync(join(appResourcePath(), 'themes'))
for (const theme of themes) {
const themePath = join(await appResourcePath(), 'themes', theme)
const themePath = join(appResourcePath(), 'themes', theme)
if (existsSync(themePath) && !lstatSync(themePath).isDirectory()) {
continue
}

View File

@ -1,4 +1,16 @@
[
{
"key": "anthropic-api-key",
"title": "API Key",
"description": "The Anthropic API uses API keys for authentication. Visit your [API Keys](https://console.anthropic.com/settings/keys) page to retrieve the API key you'll use in your requests.",
"controllerType": "input",
"controllerProps": {
"placeholder": "Insert API Key",
"value": "",
"type": "password",
"inputActions": ["unobscure", "copy"]
}
},
{
"key": "chat-completions-endpoint",
"title": "Chat Completions Endpoint",
@ -8,16 +20,5 @@
"placeholder": "https://api.anthropic.com/v1/messages",
"value": "https://api.anthropic.com/v1/messages"
}
},
{
"key": "anthropic-api-key",
"title": "API Key",
"description": "The Anthropic API uses API keys for authentication. Visit your [API Keys](https://console.anthropic.com/settings/keys) page to retrieve the API key you'll use in your requests.",
"controllerType": "input",
"controllerProps": {
"placeholder": "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"value": "",
"type": "password"
}
}
]
]

View File

@ -1,4 +1,16 @@
[
{
"key": "cohere-api-key",
"title": "API Key",
"description": "The Cohere API uses API keys for authentication. Visit your [API Keys](https://dashboard.cohere.com/api-keys) page to retrieve the API key you'll use in your requests.",
"controllerType": "input",
"controllerProps": {
"placeholder": "Insert API Key",
"value": "",
"type": "password",
"inputActions": ["unobscure", "copy"]
}
},
{
"key": "chat-completions-endpoint",
"title": "Chat Completions Endpoint",
@ -8,16 +20,5 @@
"placeholder": "https://api.cohere.ai/v1/chat",
"value": "https://api.cohere.ai/v1/chat"
}
},
{
"key": "cohere-api-key",
"title": "API Key",
"description": "The Cohere API uses API keys for authentication. Visit your [API Keys](https://dashboard.cohere.com/api-keys) page to retrieve the API key you'll use in your requests.",
"controllerType": "input",
"controllerProps": {
"placeholder": "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"value": "",
"type": "password"
}
}
]

View File

@ -1,4 +1,16 @@
[
{
"key": "groq-api-key",
"title": "API Key",
"description": "The Groq API uses API keys for authentication. Visit your [API Keys](https://console.groq.com/keys) page to retrieve the API key you'll use in your requests.",
"controllerType": "input",
"controllerProps": {
"placeholder": "Insert API Key",
"value": "",
"type": "password",
"inputActions": ["unobscure", "copy"]
}
},
{
"key": "chat-completions-endpoint",
"title": "Chat Completions Endpoint",
@ -8,16 +20,5 @@
"placeholder": "https://api.groq.com/openai/v1/chat/completions",
"value": "https://api.groq.com/openai/v1/chat/completions"
}
},
{
"key": "groq-api-key",
"title": "API Key",
"description": "The Groq API uses API keys for authentication. Visit your [API Keys](https://console.groq.com/keys) page to retrieve the API key you'll use in your requests.",
"controllerType": "input",
"controllerProps": {
"placeholder": "gsk_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"value": "",
"type": "password"
}
}
]

View File

@ -1,4 +1,16 @@
[
{
"key": "martian-api-key",
"title": "API Key",
"description": "The Martian API uses API keys for authentication. Visit your [API Keys](https://withmartian.com/dashboard) page to retrieve the API key you'll use in your requests.",
"controllerType": "input",
"controllerProps": {
"placeholder": "Insert API Key",
"value": "",
"type": "password",
"inputActions": ["unobscure", "copy"]
}
},
{
"key": "chat-completions-endpoint",
"title": "Chat Completions Endpoint",
@ -8,16 +20,5 @@
"placeholder": "https://withmartian.com/api/openai/v1/chat/completions",
"value": "https://withmartian.com/api/openai/v1/chat/completions"
}
},
{
"key": "martian-api-key",
"title": "API Key",
"description": "The Martian API uses API keys for authentication. Visit your [API Keys](https://withmartian.com/dashboard) page to retrieve the API key you'll use in your requests.",
"controllerType": "input",
"controllerProps": {
"placeholder": "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"value": "",
"type": "password"
}
}
]

View File

@ -1,4 +1,16 @@
[
{
"key": "mistral-api-key",
"title": "API Key",
"description": "The Mistral API uses API keys for authentication. Visit your [API Keys](https://console.mistral.ai/api-keys/) page to retrieve the API key you'll use in your requests.",
"controllerType": "input",
"controllerProps": {
"placeholder": "Insert API Key",
"value": "",
"type": "password",
"inputActions": ["unobscure", "copy"]
}
},
{
"key": "chat-completions-endpoint",
"title": "Chat Completions Endpoint",
@ -8,16 +20,5 @@
"placeholder": "https://api.mistral.ai/v1/chat/completions",
"value": "https://api.mistral.ai/v1/chat/completions"
}
},
{
"key": "mistral-api-key",
"title": "API Key",
"description": "The Mistral API uses API keys for authentication. Visit your [API Keys](https://console.mistral.ai/api-keys/) page to retrieve the API key you'll use in your requests.",
"controllerType": "input",
"controllerProps": {
"placeholder": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"value": "",
"type": "password"
}
}
]

View File

@ -1 +1 @@
0.4.20
0.5.0

View File

@ -1,3 +1,3 @@
@echo off
set /p CORTEX_VERSION=<./bin/version.txt
.\node_modules\.bin\download https://github.com/janhq/cortex/releases/download/v%CORTEX_VERSION%/cortex-cpp-%CORTEX_VERSION%-windows-amd64-avx2-cuda-12-0.tar.gz -e --strip 1 -o ./bin/win-cuda-12-0 && .\node_modules\.bin\download https://github.com/janhq/cortex/releases/download/v%CORTEX_VERSION%/cortex-cpp-%CORTEX_VERSION%-windows-amd64-avx2-cuda-11-7.tar.gz -e --strip 1 -o ./bin/win-cuda-11-7 && .\node_modules\.bin\download https://github.com/janhq/nitro/releases/download/v%CORTEX_VERSION%/cortex-cpp-%CORTEX_VERSION%-windows-amd64-avx2.tar.gz -e --strip 1 -o ./bin/win-cpu && .\node_modules\.bin\download https://github.com/janhq/cortex/releases/download/v%CORTEX_VERSION%/cortex-cpp-%CORTEX_VERSION%-windows-amd64-vulkan.tar.gz -e --strip 1 -o ./bin/win-vulkan
.\node_modules\.bin\download https://github.com/janhq/cortex/releases/download/v%CORTEX_VERSION%/cortex-cpp-%CORTEX_VERSION%-windows-amd64.tar.gz -e --strip 1 -o ./bin/win-cuda-12-0 && .\node_modules\.bin\download https://github.com/janhq/cortex/releases/download/v%CORTEX_VERSION%/cortex-cpp-%CORTEX_VERSION%-windows-amd64.tar.gz -e --strip 1 -o ./bin/win-cuda-11-7 && .\node_modules\.bin\download https://github.com/janhq/cortex/releases/download/v%CORTEX_VERSION%/cortex-cpp-%CORTEX_VERSION%-windows-amd64.tar.gz -e --strip 1 -o ./bin/win-cpu && .\node_modules\.bin\download https://github.com/janhq/cortex/releases/download/v%CORTEX_VERSION%/cortex-cpp-%CORTEX_VERSION%-windows-amd64.tar.gz -e --strip 1 -o ./bin/win-vulkan && .\node_modules\.bin\download https://github.com/janhq/cortex.llamacpp/releases/download/v0.1.25/cortex.llamacpp-0.1.25-windows-amd64-noavx-cuda-12-0.tar.gz -e --strip 1 -o ./bin/win-cuda-12-0/engines/cortex.llamacpp && .\node_modules\.bin\download https://github.com/janhq/cortex.llamacpp/releases/download/v0.1.25/cortex.llamacpp-0.1.25-windows-amd64-noavx-cuda-11-7.tar.gz -e --strip 1 -o ./bin/win-cuda-11-7/engines/cortex.llamacpp && .\node_modules\.bin\download https://github.com/janhq/cortex.llamacpp/releases/download/v0.1.25/cortex.llamacpp-0.1.25-windows-amd64-noavx.tar.gz -e --strip 1 -o ./bin/win-cpu/engines/cortex.llamacpp && .\node_modules\.bin\download https://github.com/janhq/cortex.llamacpp/releases/download/v0.1.25/cortex.llamacpp-0.1.25-windows-amd64-vulkan.tar.gz -e --strip 1 -o ./bin/win-vulkan/engines/cortex.llamacpp

View File

@ -1,7 +1,7 @@
{
"name": "@janhq/inference-cortex-extension",
"productName": "Cortex Inference Engine",
"version": "1.0.14",
"version": "1.0.15",
"description": "This extension embeds cortex.cpp, a lightweight inference engine written in C++. See https://nitro.jan.ai.\nAdditional dependencies could be installed to run without Cuda Toolkit installation.",
"main": "dist/index.js",
"node": "dist/node/index.cjs.js",
@ -10,8 +10,8 @@
"scripts": {
"test": "jest",
"build": "tsc --module commonjs && rollup -c rollup.config.ts",
"downloadnitro:linux": "CORTEX_VERSION=$(cat ./bin/version.txt) && download https://github.com/janhq/cortex/releases/download/v${CORTEX_VERSION}/cortex-cpp-${CORTEX_VERSION}-linux-amd64-avx2.tar.gz -e --strip 1 -o ./bin/linux-cpu && chmod +x ./bin/linux-cpu/cortex-cpp && download https://github.com/janhq/cortex/releases/download/v${CORTEX_VERSION}/cortex-cpp-${CORTEX_VERSION}-linux-amd64-avx2-cuda-12-0.tar.gz -e --strip 1 -o ./bin/linux-cuda-12-0 && chmod +x ./bin/linux-cuda-12-0/cortex-cpp && download https://github.com/janhq/cortex/releases/download/v${CORTEX_VERSION}/cortex-cpp-${CORTEX_VERSION}-linux-amd64-avx2-cuda-11-7.tar.gz -e --strip 1 -o ./bin/linux-cuda-11-7 && chmod +x ./bin/linux-cuda-11-7/cortex-cpp && download https://github.com/janhq/cortex/releases/download/v${CORTEX_VERSION}/cortex-cpp-${CORTEX_VERSION}-linux-amd64-vulkan.tar.gz -e --strip 1 -o ./bin/linux-vulkan && chmod +x ./bin/linux-vulkan/cortex-cpp",
"downloadnitro:darwin": "CORTEX_VERSION=$(cat ./bin/version.txt) && download https://github.com/janhq/cortex/releases/download/v${CORTEX_VERSION}/cortex-cpp-${CORTEX_VERSION}-mac-arm64.tar.gz -o ./bin/ && mkdir -p ./bin/mac-arm64 && tar -zxvf ./bin/cortex-cpp-${CORTEX_VERSION}-mac-arm64.tar.gz --strip-components=1 -C ./bin/mac-arm64 && rm -rf ./bin/cortex-cpp-${CORTEX_VERSION}-mac-arm64.tar.gz && chmod +x ./bin/mac-arm64/cortex-cpp && download https://github.com/janhq/cortex/releases/download/v${CORTEX_VERSION}/cortex-cpp-${CORTEX_VERSION}-mac-amd64.tar.gz -o ./bin/ && mkdir -p ./bin/mac-amd64 && tar -zxvf ./bin/cortex-cpp-${CORTEX_VERSION}-mac-amd64.tar.gz --strip-components=1 -C ./bin/mac-amd64 && rm -rf ./bin/cortex-cpp-${CORTEX_VERSION}-mac-amd64.tar.gz && chmod +x ./bin/mac-amd64/cortex-cpp",
"downloadnitro:linux": "CORTEX_VERSION=$(cat ./bin/version.txt) && download https://github.com/janhq/cortex/releases/download/v${CORTEX_VERSION}/cortex-cpp-${CORTEX_VERSION}-linux-amd64.tar.gz -e --strip 1 -o ./bin/linux-cpu && chmod +x ./bin/linux-cpu/cortex-cpp && download https://github.com/janhq/cortex/releases/download/v${CORTEX_VERSION}/cortex-cpp-${CORTEX_VERSION}-linux-amd64.tar.gz -e --strip 1 -o ./bin/linux-cuda-12-0 && chmod +x ./bin/linux-cuda-12-0/cortex-cpp && download https://github.com/janhq/cortex/releases/download/v${CORTEX_VERSION}/cortex-cpp-${CORTEX_VERSION}-linux-amd64.tar.gz -e --strip 1 -o ./bin/linux-cuda-11-7 && chmod +x ./bin/linux-cuda-11-7/cortex-cpp && download https://github.com/janhq/cortex/releases/download/v${CORTEX_VERSION}/cortex-cpp-${CORTEX_VERSION}-linux-amd64.tar.gz -e --strip 1 -o ./bin/linux-vulkan && chmod +x ./bin/linux-vulkan/cortex-cpp && download https://github.com/janhq/cortex.llamacpp/releases/download/v0.1.25/cortex.llamacpp-0.1.25-linux-amd64-noavx.tar.gz -e --strip 1 -o ./bin/linux-cpu/engines/cortex.llamacpp && download https://github.com/janhq/cortex.llamacpp/releases/download/v0.1.25/cortex.llamacpp-0.1.25-linux-amd64-noavx-cuda-12-0.tar.gz -e --strip 1 -o ./bin/linux-cuda-12-0/engines/cortex.llamacpp && download https://github.com/janhq/cortex.llamacpp/releases/download/v0.1.25/cortex.llamacpp-0.1.25-linux-amd64-noavx-cuda-11-7.tar.gz -e --strip 1 -o ./bin/linux-cuda-11-7/engines/cortex.llamacpp && download https://github.com/janhq/cortex.llamacpp/releases/download/v0.1.25/cortex.llamacpp-0.1.25-linux-amd64-vulkan.tar.gz -e --strip 1 -o ./bin/linux-vulkan/engines/cortex.llamacpp",
"downloadnitro:darwin": "CORTEX_VERSION=$(cat ./bin/version.txt) && download https://github.com/janhq/cortex/releases/download/v${CORTEX_VERSION}/cortex-cpp-${CORTEX_VERSION}-mac-arm64.tar.gz -o ./bin/ && mkdir -p ./bin/mac-arm64 && tar -zxvf ./bin/cortex-cpp-${CORTEX_VERSION}-mac-arm64.tar.gz --strip-components=1 -C ./bin/mac-arm64 && rm -rf ./bin/cortex-cpp-${CORTEX_VERSION}-mac-arm64.tar.gz && chmod +x ./bin/mac-arm64/cortex-cpp && download https://github.com/janhq/cortex/releases/download/v${CORTEX_VERSION}/cortex-cpp-${CORTEX_VERSION}-mac-amd64.tar.gz -o ./bin/ && mkdir -p ./bin/mac-amd64 && tar -zxvf ./bin/cortex-cpp-${CORTEX_VERSION}-mac-amd64.tar.gz --strip-components=1 -C ./bin/mac-amd64 && rm -rf ./bin/cortex-cpp-${CORTEX_VERSION}-mac-amd64.tar.gz && chmod +x ./bin/mac-amd64/cortex-cpp && download https://github.com/janhq/cortex.llamacpp/releases/download/v0.1.25/cortex.llamacpp-0.1.25-mac-arm64.tar.gz -e --strip 1 -o ./bin/mac-arm64/engines/cortex.llamacpp && download https://github.com/janhq/cortex.llamacpp/releases/download/v0.1.25/cortex.llamacpp-0.1.25-mac-amd64.tar.gz -e --strip 1 -o ./bin/mac-amd64/engines/cortex.llamacpp",
"downloadnitro:win32": "download.bat",
"downloadnitro": "run-script-os",
"build:publish:darwin": "rimraf *.tgz --glob && yarn build && npm run downloadnitro && ../../.github/scripts/auto-sign.sh && cpx \"bin/**\" \"dist/bin\" && npm pack && cpx *.tgz ../../pre-install",

View File

@ -1,20 +1,20 @@
{
"sources": [
{
"filename": "gemma-2b-it-q4_k_m.gguf",
"url": "https://huggingface.co/lmstudio-ai/gemma-2b-it-GGUF/resolve/main/gemma-2b-it-q4_k_m.gguf"
"filename": "gemma-1.1-2b-it-q4_k_m.gguf",
"url": "https://huggingface.co/bartowski/gemma-1.1-2b-it-GGUF/resolve/main/gemma-1.1-2b-it-Q4_K_M.gguf"
}
],
"id": "gemma-2b",
"id": "gemma-1.1-2b-it",
"object": "model",
"name": "Gemma 2B Q4",
"name": "Gemma 1.1 2B Q4",
"version": "1.3",
"description": "Gemma is built from the same technology with Google's Gemini.",
"format": "gguf",
"settings": {
"ctx_len": 8192,
"prompt_template": "<start_of_turn>user\n{prompt}<end_of_turn>\n<start_of_turn>model",
"llama_model_path": "gemma-2b-it-q4_k_m.gguf",
"llama_model_path": "gemma-1.1-2b-it-Q4_K_M.gguf",
"ngl": 19
},
"parameters": {
@ -29,7 +29,7 @@
"metadata": {
"author": "Google",
"tags": ["2B", "Finetuned", "Tiny"],
"size": 1500000000
"size": 1630000000
},
"engine": "nitro"
}

View File

@ -1,20 +1,20 @@
{
"sources": [
{
"filename": "gemma-7b-it-q4_K_M.gguf",
"url": "https://huggingface.co/mmnga/gemma-7b-it-gguf/resolve/main/gemma-7b-it-q4_K_M.gguf"
"filename": "gemma-1.1-7b-it-q4_K_M.gguf",
"url": "https://huggingface.co/bartowski/gemma-1.1-7b-it-GGUF/resolve/main/gemma-1.1-7b-it-Q4_K_M.gguf"
}
],
"id": "gemma-7b",
"id": "gemma-1.1-7b-it",
"object": "model",
"name": "Gemma 7B Q4",
"name": "Gemma 1.1 7B Q4",
"version": "1.2",
"description": "Google's Gemma is built for multilingual purpose",
"format": "gguf",
"settings": {
"ctx_len": 8192,
"prompt_template": "<start_of_turn>user\n{prompt}<end_of_turn>\n<start_of_turn>model",
"llama_model_path": "gemma-7b-it-q4_K_M.gguf",
"llama_model_path": "gemma-1.1-7b-it-q4_K_M.gguf",
"ngl": 29
},
"parameters": {

View File

@ -0,0 +1,42 @@
{
"sources": [
{
"filename": "gemma-2-27b-it-Q4_K_M.gguf",
"url": "https://huggingface.co/bartowski/gemma-2-27b-it-GGUF/resolve/main/gemma-2-27b-it-Q4_K_M.gguf"
}
],
"id": "gemma-2-27b-it",
"object": "model",
"name": "Gemma 2 27B Q4",
"version": "1.0",
"description": "Gemma is built from the same technology with Google's Gemini.",
"format": "gguf",
"settings": {
"ctx_len": 8192,
"prompt_template": "<bos><start_of_turn>user\n{prompt}<end_of_turn>\n<start_of_turn>model\n<end_of_turn>\n<start_of_turn>model\n",
"llama_model_path": "gemma-2-27b-it-Q4_K_M.gguf",
"ngl": 47
},
"parameters": {
"temperature": 0.7,
"top_p": 0.95,
"stream": true,
"max_tokens": 8192,
"stop": [
"<end_of_turn>"
],
"frequency_penalty": 0,
"presence_penalty": 0
},
"metadata": {
"author": "Google",
"tags": [
"27B",
"Conversational",
"Text-generation",
"Featured"
],
"size": 16600000000
},
"engine": "nitro"
}

View File

@ -0,0 +1,43 @@
{
"sources": [
{
"filename": "gemma-2-2b-it-Q4_K_M.gguf",
"url": "https://huggingface.co/bartowski/gemma-2-2b-it-GGUF/resolve/main/gemma-2-2b-it-Q4_K_M.gguf"
}
],
"id": "gemma-2-2b-it",
"object": "model",
"name": "Gemma 2 2B Q4",
"version": "1.0",
"description": "Gemma is built from the same technology with Google's Gemini.",
"format": "gguf",
"settings": {
"ctx_len": 8192,
"prompt_template": "<bos><start_of_turn>user\n{prompt}<end_of_turn>\n<start_of_turn>model\n<end_of_turn>\n<start_of_turn>model\n",
"llama_model_path": "gemma-2-2b-it-Q4_K_M.gguf",
"ngl": 27
},
"parameters": {
"temperature": 0.7,
"top_p": 0.95,
"stream": true,
"max_tokens": 8192,
"stop": [
"<end_of_turn>"
],
"frequency_penalty": 0,
"presence_penalty": 0
},
"metadata": {
"author": "Google",
"tags": [
"2B",
"Tiny",
"Conversational",
"Text-generation",
"Featured"
],
"size": 1710000000
},
"engine": "nitro"
}

View File

@ -0,0 +1,42 @@
{
"sources": [
{
"filename": "gemma-2-9b-it-Q4_K_M.gguf",
"url": "https://huggingface.co/bartowski/gemma-2-9b-it-GGUF/resolve/main/gemma-2-9b-it-Q4_K_M.gguf"
}
],
"id": "gemma-2-9b-it",
"object": "model",
"name": "Gemma 2 9B Q4",
"version": "1.0",
"description": "Gemma is built from the same technology with Google's Gemini.",
"format": "gguf",
"settings": {
"ctx_len": 8192,
"prompt_template": "<bos><start_of_turn>user\n{prompt}<end_of_turn>\n<start_of_turn>model\n<end_of_turn>\n<start_of_turn>model\n",
"llama_model_path": "gemma-2-9b-it-Q4_K_M.gguf",
"ngl": 43
},
"parameters": {
"temperature": 0.7,
"top_p": 0.95,
"stream": true,
"max_tokens": 8192,
"stop": [
"<end_of_turn>"
],
"frequency_penalty": 0,
"presence_penalty": 0
},
"metadata": {
"author": "Google",
"tags": [
"9B",
"Conversational",
"Text-generation",
"Featured"
],
"size": 5760000000
},
"engine": "nitro"
}

View File

@ -2,7 +2,7 @@
"sources": [
{
"filename": "Meta-Llama-3-8B-Instruct-Q4_K_M.gguf",
"url": "https://huggingface.co/lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf"
"url": "https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf"
}
],
"id": "llama3-8b-instruct",
@ -28,7 +28,7 @@
},
"metadata": {
"author": "MetaAI",
"tags": ["7B", "Featured"],
"tags": ["8B", "Featured"],
"size": 4920000000
},
"engine": "nitro"

View File

@ -0,0 +1,42 @@
{
"sources": [
{
"filename": "Meta-Llama-3.1-70B-Instruct-Q4_K_M.gguf",
"url": "https://huggingface.co/bartowski/Meta-Llama-3.1-70B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-70B-Instruct-Q4_K_M.gguf"
}
],
"id": "llama3.1-70b-instruct",
"object": "model",
"name": "Llama 3.1 70B Q4 Instruct",
"version": "1.0",
"description": "Meta's Llama 3.1 excels at general usage situations, including chat, general world knowledge, and coding.",
"format": "gguf",
"settings": {
"ctx_len": 131072,
"prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"llama_model_path": "Meta-Llama-3.1-70B-Instruct-Q4_K_M.gguf",
"ngl": 33
},
"parameters": {
"temperature": 0.7,
"top_p": 0.95,
"stream": true,
"max_tokens": 8192,
"stop": [
"<|end_of_text|>",
"<|eot_id|>",
"<|eom_id|>"
],
"frequency_penalty": 0,
"presence_penalty": 0
},
"metadata": {
"author": "MetaAI",
"tags": [
"70B",
"Featured"
],
"size": 42500000000
},
"engine": "nitro"
}

View File

@ -0,0 +1,42 @@
{
"sources": [
{
"filename": "Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf",
"url": "https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf"
}
],
"id": "llama3.1-8b-instruct",
"object": "model",
"name": "Llama 3.1 8B Q4 Instruct",
"version": "1.0",
"description": "Meta's Llama 3.1 excels at general usage situations, including chat, general world knowledge, and coding.",
"format": "gguf",
"settings": {
"ctx_len": 131072,
"prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"llama_model_path": "Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf",
"ngl": 33
},
"parameters": {
"temperature": 0.7,
"top_p": 0.95,
"stream": true,
"max_tokens": 8192,
"stop": [
"<|end_of_text|>",
"<|eot_id|>",
"<|eom_id|>"
],
"frequency_penalty": 0,
"presence_penalty": 0
},
"metadata": {
"author": "MetaAI",
"tags": [
"8B",
"Featured"
],
"size": 4920000000
},
"engine": "nitro"
}

View File

@ -12,8 +12,8 @@ const codeninja7bJson = require('./resources/models/codeninja-1.0-7b/model.json'
const commandr34bJson = require('./resources/models/command-r-34b/model.json')
const deepseekCoder13bJson = require('./resources/models/deepseek-coder-1.3b/model.json')
const deepseekCoder34bJson = require('./resources/models/deepseek-coder-34b/model.json')
const gemma2bJson = require('./resources/models/gemma-2b/model.json')
const gemma7bJson = require('./resources/models/gemma-7b/model.json')
const gemma112bJson = require('./resources/models/gemma-1.1-2b/model.json')
const gemma117bJson = require('./resources/models/gemma-1.1-7b/model.json')
const llama2Chat70bJson = require('./resources/models/llama2-chat-70b/model.json')
const llama2Chat7bJson = require('./resources/models/llama2-chat-7b/model.json')
const llamacorn1bJson = require('./resources/models/llamacorn-1.1b/model.json')
@ -40,7 +40,11 @@ const aya35bJson = require('./resources/models/aya-23-35b/model.json')
const phimediumJson = require('./resources/models/phi3-medium/model.json')
const codestralJson = require('./resources/models/codestral-22b/model.json')
const qwen2Json = require('./resources/models/qwen2-7b/model.json')
const llama318bJson = require('./resources/models/llama3.1-8b-instruct/model.json')
const llama3170bJson = require('./resources/models/llama3.1-70b-instruct/model.json')
const gemma22bJson = require('./resources/models/gemma-2-2b/model.json')
const gemma29bJson = require('./resources/models/gemma-2-9b/model.json')
const gemma227bJson = require('./resources/models/gemma-2-27b/model.json')
export default [
{
@ -60,8 +64,8 @@ export default [
commandr34bJson,
deepseekCoder13bJson,
deepseekCoder34bJson,
gemma2bJson,
gemma7bJson,
gemma112bJson,
gemma117bJson,
llama2Chat70bJson,
llama2Chat7bJson,
llamacorn1bJson,
@ -87,7 +91,12 @@ export default [
aya8bJson,
aya35bJson,
codestralJson,
qwen2Json
qwen2Json,
llama318bJson,
llama3170bJson,
gemma22bJson,
gemma29bJson,
gemma227bJson
]),
NODE: JSON.stringify(`${packageJson.name}/${packageJson.node}`),
DEFAULT_SETTINGS: JSON.stringify(defaultSettingJson),

View File

@ -260,9 +260,14 @@ function loadLLMModel(settings: any): Promise<Response> {
async function validateModelStatus(modelId: string): Promise<void> {
// Send a GET request to the validation URL.
// Retry the request up to 3 times if it fails, with a delay of 500 milliseconds between retries.
log(`[CORTEX]::Debug: Validating model ${modelId}`)
return fetchRetry(NITRO_HTTP_VALIDATE_MODEL_URL, {
method: 'POST',
body: JSON.stringify({ model: modelId }),
body: JSON.stringify({
model: modelId,
// TODO: force to use cortex llamacpp by default
engine: 'cortex.llamacpp'
}),
headers: {
'Content-Type': 'application/json',
},
@ -288,8 +293,9 @@ async function validateModelStatus(modelId: string): Promise<void> {
return Promise.resolve()
}
}
const errorBody = await res.text()
log(
`[CORTEX]::Debug: Validate model state failed with response ${JSON.stringify(
`[CORTEX]::Debug: Validate model state failed with response ${errorBody} and status is ${JSON.stringify(
res.statusText
)}`
)

View File

@ -1,4 +1,16 @@
[
{
"key": "nvidia-api-key",
"title": "API Key",
"description": "The NVIDIA API uses API keys for authentication. Visit your [API Keys](https://org.ngc.nvidia.com/setup/personal-keys) page to retrieve the API key you'll use in your requests..",
"controllerType": "input",
"controllerProps": {
"placeholder": "Insert API Key",
"value": "",
"type": "password",
"inputActions": ["unobscure", "copy"]
}
},
{
"key": "chat-completions-endpoint",
"title": "Chat Completions Endpoint",
@ -8,17 +20,5 @@
"placeholder": "https://integrate.api.nvidia.com/v1/chat/completions",
"value": "https://integrate.api.nvidia.com/v1/chat/completions"
}
},
{
"key": "nvidia-api-key",
"title": "API Key",
"description": "The NVIDIA API uses API keys for authentication. Visit your [API Keys](https://org.ngc.nvidia.com/setup/personal-keys) page to retrieve the API key you'll use in your requests..",
"controllerType": "input",
"controllerProps": {
"placeholder": "nvapi-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"value": "",
"type": "password",
"inputActions": ["unobscure", "copy"]
}
}
]

View File

@ -1,4 +1,16 @@
[
{
"key": "openai-api-key",
"title": "API Key",
"description": "The OpenAI API uses API keys for authentication. Visit your [API Keys](https://platform.openai.com/account/api-keys) page to retrieve the API key you'll use in your requests.",
"controllerType": "input",
"controllerProps": {
"placeholder": "Insert API Key",
"value": "",
"type": "password",
"inputActions": ["unobscure", "copy"]
}
},
{
"key": "chat-completions-endpoint",
"title": "Chat Completions Endpoint",
@ -8,16 +20,5 @@
"placeholder": "https://api.openai.com/v1/chat/completions",
"value": "https://api.openai.com/v1/chat/completions"
}
},
{
"key": "openai-api-key",
"title": "API Key",
"description": "The OpenAI API uses API keys for authentication. Visit your [API Keys](https://platform.openai.com/account/api-keys) page to retrieve the API key you'll use in your requests.",
"controllerType": "input",
"controllerProps": {
"placeholder": "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"value": "",
"type": "password"
}
}
]

View File

@ -1,8 +1,20 @@
[
{
"key": "openrouter-api-key",
"title": "API Key",
"description": "The OpenRouter API uses API keys for authentication. Visit your [API Keys](https://openrouter.ai/keys) page to retrieve the API key you'll use in your requests.",
"controllerType": "input",
"controllerProps": {
"placeholder": "Insert API Key",
"value": "",
"type": "password",
"inputActions": ["unobscure", "copy"]
}
},
{
"key": "chat-completions-endpoint",
"title": "Chat Completions Endpoint",
"description": "The endpoint to use for chat completions. See the [OpenRouter API documentation](https://openrouter.ai/docs) for more information.",
"description": "The endpoint to use for chat completions. See the [OpenRouter API documentation](https://openrouter.ai/docs/requests) for more information.",
"controllerType": "input",
"controllerProps": {
"placeholder": "https://openrouter.ai/api/v1/chat/completions",
@ -10,14 +22,13 @@
}
},
{
"key": "openrouter-api-key",
"title": "API Key",
"description": "The OpenRouter API uses API keys for authentication. Visit your [API Keys](https://openrouter.ai/keys) page to retrieve the API key you'll use in your requests.",
"key": "openrouter-model",
"title": "Model",
"description": "If the model parameter is omitted, the user or payer's default is used. Otherwise, remember to select a value for model from the [supported models](https://openrouter.ai/docs/models) or API, and include the organization prefix.",
"controllerType": "input",
"controllerProps": {
"placeholder": "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"value": "",
"type": "password"
"placeholder": "Leave empty for default model",
"value": ""
}
}
]

View File

@ -8,22 +8,16 @@
import { RemoteOAIEngine } from '@janhq/core'
import { PayloadType } from '@janhq/core'
import { ChatCompletionRole } from '@janhq/core'
declare const SETTINGS: Array<any>
declare const MODELS: Array<any>
enum Settings {
apiKey = 'openrouter-api-key',
model = 'openrouter-model',
chatCompletionsEndPoint = 'chat-completions-endpoint',
}
enum RoleType {
user = 'USER',
chatbot = 'CHATBOT',
system = 'SYSTEM',
}
/**
* A class that implements the InferenceExtension interface from the @janhq/core package.
* The class provides methods for initializing and stopping a model, and for making inference requests.
@ -32,6 +26,7 @@ enum RoleType {
export default class JanInferenceOpenRouterExtension extends RemoteOAIEngine {
inferenceUrl: string = ''
provider: string = 'openrouter'
model?: string | undefined
override async onLoad(): Promise<void> {
super.onLoad()
@ -45,6 +40,9 @@ export default class JanInferenceOpenRouterExtension extends RemoteOAIEngine {
Settings.chatCompletionsEndPoint,
''
)
this.model = await this.getSetting<string>(Settings.model, '')
// Openrouter uses default model on no model param set
if (!this.model?.length) this.model = undefined
if (this.inferenceUrl.length === 0) {
SETTINGS.forEach((setting) => {
if (setting.key === Settings.chatCompletionsEndPoint) {
@ -54,6 +52,14 @@ export default class JanInferenceOpenRouterExtension extends RemoteOAIEngine {
}
}
override async headers(): Promise<HeadersInit> {
return {
'Content-Type': 'application/json',
'HTTP-Referer': 'https://jan.ai',
'Authorization': `Bearer ${this.apiKey}`,
}
}
onSettingUpdate<T>(key: string, value: T): void {
if (key === Settings.apiKey) {
this.apiKey = value as string
@ -69,8 +75,14 @@ export default class JanInferenceOpenRouterExtension extends RemoteOAIEngine {
} else {
this.inferenceUrl = value
}
} else if (key === Settings.model) {
this.model =
typeof value === 'string' && value.length > 0 ? value : undefined
}
}
transformPayload = (payload: PayloadType)=>({...payload,model:"openrouter/auto"})
transformPayload = (payload: PayloadType) => ({
...payload,
model: this.model,
})
}

View File

@ -1,4 +1,16 @@
[
{
"key": "tritonllm-api-key",
"title": "API Key",
"description": "The Triton LLM API uses API keys for authentication.",
"controllerType": "input",
"controllerProps": {
"placeholder": "Insert API Key",
"value": "",
"type": "password",
"inputActions": ["unobscure", "copy"]
}
},
{
"key": "chat-completions-endpoint",
"title": "Chat Completions Endpoint",
@ -8,16 +20,5 @@
"placeholder": "http://localhost:8000/v2/models/tensorrt_llm_bls/generate",
"value": "http://localhost:8000/v2/models/tensorrt_llm_bls/generate"
}
},
{
"key": "tritonllm-api-key",
"title": "Triton LLM API Key",
"description": "The Triton LLM API uses API keys for authentication.",
"controllerType": "input",
"controllerProps": {
"placeholder": "xxxxxxxxxxxxxxxxxxxx",
"value": "",
"type": "password"
}
}
]

View File

@ -1,3 +0,0 @@
@echo off
set /p LLAMA_CPP_VERSION=<./scripts/version.txt
.\node_modules\.bin\download https://github.com/ggerganov/llama.cpp/archive/refs/tags/%LLAMA_CPP_VERSION%.tar.gz -o . --filename ./scripts/llama.cpp.tar.gz && tar -xzf .\scripts\llama.cpp.tar.gz "llama.cpp-%LLAMA_CPP_VERSION%/convert.py" "llama.cpp-%LLAMA_CPP_VERSION%/convert-hf-to-gguf.py" "llama.cpp-%LLAMA_CPP_VERSION%/gguf-py" && cpx "./llama.cpp-%LLAMA_CPP_VERSION%/**" "scripts" && rimraf "./scripts/llama.cpp.tar.gz" && rimraf "./llama.cpp-%LLAMA_CPP_VERSION%"

View File

@ -9,31 +9,25 @@
"license": "AGPL-3.0",
"scripts": {
"build": "tsc --module commonjs && rollup -c rollup.config.ts --configPlugin @rollup/plugin-typescript --bundleConfigAsCjs",
"download:llama": "run-script-os",
"download:llama:linux": "LLAMA_CPP_VERSION=$(cat ./scripts/version.txt) && download https://github.com/ggerganov/llama.cpp/archive/refs/tags/${LLAMA_CPP_VERSION}.tar.gz -o . --filename ./scripts/llama.cpp.tar.gz && tar -xzf ./scripts/llama.cpp.tar.gz --wildcards '*/convert.py' '*/convert-hf-to-gguf.py' '*/gguf-py' && cpx \"./llama.cpp-$LLAMA_CPP_VERSION/**\" \"scripts\" && rimraf \"./scripts/llama.cpp.tar.gz\" && rimraf \"./llama.cpp-$LLAMA_CPP_VERSION\"",
"download:llama:darwin": "LLAMA_CPP_VERSION=$(cat ./scripts/version.txt) && download https://github.com/ggerganov/llama.cpp/archive/refs/tags/${LLAMA_CPP_VERSION}.tar.gz -o . --filename ./scripts/llama.cpp.tar.gz && tar -xzf ./scripts/llama.cpp.tar.gz '*/convert.py' '*/convert-hf-to-gguf.py' '*/gguf-py' && cpx \"./llama.cpp-$LLAMA_CPP_VERSION/**\" \"scripts\" && rimraf \"./scripts/llama.cpp.tar.gz\" && rimraf \"./llama.cpp-$LLAMA_CPP_VERSION\"",
"download:llama:win32": "download.bat",
"build:publish:linux": "rimraf *.tgz --glob && yarn build && yarn download:llama && cpx \"scripts/**\" \"dist/scripts\" && cpx \"bin/**\" \"dist/bin\" && npm pack && cpx *.tgz ../../pre-install",
"build:publish:darwin": "rimraf *.tgz --glob && yarn build && yarn download:llama && cpx \"scripts/**\" \"dist/scripts\" && cpx \"bin/**\" \"dist/bin\" && ../../.github/scripts/auto-sign.sh && npm pack && cpx *.tgz ../../pre-install",
"build:publish:win32": "rimraf *.tgz --glob && yarn build && yarn download:llama && cpx \"scripts/**\" \"dist/scripts\" && cpx \"bin/**\" \"dist/bin\" && npm pack && cpx *.tgz ../../pre-install",
"build:publish": "run-script-os"
"build:publish": "rimraf *.tgz --glob && yarn build && npm pack && cpx *.tgz ../../pre-install"
},
"devDependencies": {
"cpx": "^1.5.0",
"download-cli": "^1.1.1",
"rimraf": "^3.0.2",
"ts-loader": "^9.5.0",
"typescript": "5.3.3",
"@rollup/plugin-commonjs": "^25.0.7",
"@rollup/plugin-json": "^6.1.0",
"@rollup/plugin-node-resolve": "^15.2.3",
"@rollup/plugin-replace": "^5.0.5",
"@rollup/plugin-typescript": "^11.1.6",
"@types/pdf-parse": "^1.1.4",
"cpx": "^1.5.0",
"download-cli": "^1.1.1",
"rimraf": "^3.0.2",
"rollup": "^2.38.5",
"rollup-plugin-define": "^1.0.1",
"rollup-plugin-sourcemaps": "^0.6.3",
"rollup-plugin-typescript2": "^0.36.0"
"rollup-plugin-typescript2": "^0.36.0",
"run-script-os": "^1.1.6",
"ts-loader": "^9.5.0",
"typescript": "5.3.3"
},
"files": [
"dist/*",
@ -41,8 +35,15 @@
"README.md"
],
"dependencies": {
"@janhq/core": "file:../../core",
"@huggingface/gguf": "^0.0.11",
"@huggingface/jinja": "^0.3.0",
"@janhq/core": "file:../../core",
"hyllama": "^0.2.2",
"python-shell": "^5.0.0"
}
},
"bundleDependencies": [
"hyllama",
"@huggingface/gguf",
"@huggingface/jinja"
]
}

View File

@ -3,7 +3,7 @@ import sourceMaps from 'rollup-plugin-sourcemaps'
import typescript from 'rollup-plugin-typescript2'
import json from '@rollup/plugin-json'
import replace from '@rollup/plugin-replace'
import commonjs from '@rollup/plugin-commonjs'
const settingJson = require('./resources/settings.json')
const packageJson = require('./package.json')
const defaultModelJson = require('./resources/default-model.json')
@ -39,6 +39,39 @@ export default [
browser: true,
}),
// Resolve source maps to the original source
sourceMaps(),
],
},
{
input: `src/node/index.ts`,
output: [
{
file: 'dist/node/index.cjs.js',
format: 'cjs',
sourcemap: true,
inlineDynamicImports: true,
},
],
// Indicate here external modules you don't wanna include in your bundle (i.e.: 'lodash')
external: ['@janhq/core/node'],
watch: {
include: 'src/node/**',
},
plugins: [
// Allow json resolution
json(),
// Compile TypeScript files
typescript({ useTsconfigDeclarationDir: true }),
// Allow bundling cjs modules (unlike webpack, rollup doesn't understand cjs)
commonjs(),
// Allow node_modules resolution, so you can use 'external' to control
// which external modules to include in the bundle
// https://github.com/rollup/rollup-plugin-node-resolve#usage
resolve({
extensions: ['.ts', '.js', '.json'],
}),
// Resolve source maps to the original source
sourceMaps(),
],

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,21 +0,0 @@
MIT License
Copyright (c) 2023 Georgi Gerganov
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@ -1,81 +0,0 @@
## gguf
This is a Python package for writing binary files in the [GGUF](https://github.com/ggerganov/ggml/pull/302)
(GGML Universal File) format.
See [convert-llama-hf-to-gguf.py](https://github.com/ggerganov/llama.cpp/blob/master/convert-hf-to-gguf.py)
as an example for its usage.
## Installation
```sh
pip install gguf
```
## API Examples/Simple Tools
[examples/writer.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/examples/writer.py) — Generates `example.gguf` in the current directory to demonstrate generating a GGUF file. Note that this file cannot be used as a model.
[scripts/gguf-dump.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/scripts/gguf-dump.py) — Dumps a GGUF file's metadata to the console.
[scripts/gguf-set-metadata.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/scripts/gguf-set-metadata.py) — Allows changing simple metadata values in a GGUF file by key.
[scripts/gguf-convert-endian.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/scripts/gguf-convert-endian.py) — Allows converting the endianness of GGUF files.
## Development
Maintainers who participate in development of this package are advised to install it in editable mode:
```sh
cd /path/to/llama.cpp/gguf-py
pip install --editable .
```
**Note**: This may require to upgrade your Pip installation, with a message saying that editable installation currently requires `setup.py`.
In this case, upgrade Pip to the latest:
```sh
pip install --upgrade pip
```
## Automatic publishing with CI
There's a GitHub workflow to make a release automatically upon creation of tags in a specified format.
1. Bump the version in `pyproject.toml`.
2. Create a tag named `gguf-vx.x.x` where `x.x.x` is the semantic version number.
```sh
git tag -a gguf-v1.0.0 -m "Version 1.0 release"
```
3. Push the tags.
```sh
git push origin --tags
```
## Manual publishing
If you want to publish the package manually for any reason, you need to have `twine` and `build` installed:
```sh
pip install build twine
```
Then, follow these steps to release a new version:
1. Bump the version in `pyproject.toml`.
2. Build the package:
```sh
python -m build
```
3. Upload the generated distribution archives:
```sh
python -m twine upload dist/*
```
## TODO
- [ ] Add tests
- [ ] Include conversion scripts as command line entry points in this package.

View File

@ -1,40 +0,0 @@
#!/usr/bin/env python3
import sys
from pathlib import Path
import numpy as np
# Necessary to load the local gguf package
sys.path.insert(0, str(Path(__file__).parent.parent))
from gguf import GGUFWriter # noqa: E402
# Example usage:
def writer_example() -> None:
# Example usage with a file
gguf_writer = GGUFWriter("example.gguf", "llama")
gguf_writer.add_architecture()
gguf_writer.add_block_count(12)
gguf_writer.add_uint32("answer", 42) # Write a 32-bit integer
gguf_writer.add_float32("answer_in_float", 42.0) # Write a 32-bit float
gguf_writer.add_custom_alignment(64)
tensor1 = np.ones((32,), dtype=np.float32) * 100.0
tensor2 = np.ones((64,), dtype=np.float32) * 101.0
tensor3 = np.ones((96,), dtype=np.float32) * 102.0
gguf_writer.add_tensor("tensor1", tensor1)
gguf_writer.add_tensor("tensor2", tensor2)
gguf_writer.add_tensor("tensor3", tensor3)
gguf_writer.write_header_to_file()
gguf_writer.write_kv_data_to_file()
gguf_writer.write_tensors_to_file()
gguf_writer.close()
if __name__ == '__main__':
writer_example()

View File

@ -1,5 +0,0 @@
from .constants import *
from .gguf_reader import *
from .gguf_writer import *
from .tensor_mapping import *
from .vocab import *

View File

@ -1,665 +0,0 @@
from __future__ import annotations
import sys
from enum import Enum, IntEnum, auto
from typing import Any
#
# constants
#
GGUF_MAGIC = 0x46554747 # "GGUF"
GGUF_VERSION = 3
GGUF_DEFAULT_ALIGNMENT = 32
#
# metadata keys
#
class Keys:
class General:
ARCHITECTURE = "general.architecture"
QUANTIZATION_VERSION = "general.quantization_version"
ALIGNMENT = "general.alignment"
NAME = "general.name"
AUTHOR = "general.author"
URL = "general.url"
DESCRIPTION = "general.description"
LICENSE = "general.license"
SOURCE_URL = "general.source.url"
SOURCE_HF_REPO = "general.source.huggingface.repository"
FILE_TYPE = "general.file_type"
class LLM:
CONTEXT_LENGTH = "{arch}.context_length"
EMBEDDING_LENGTH = "{arch}.embedding_length"
BLOCK_COUNT = "{arch}.block_count"
FEED_FORWARD_LENGTH = "{arch}.feed_forward_length"
USE_PARALLEL_RESIDUAL = "{arch}.use_parallel_residual"
TENSOR_DATA_LAYOUT = "{arch}.tensor_data_layout"
EXPERT_COUNT = "{arch}.expert_count"
EXPERT_USED_COUNT = "{arch}.expert_used_count"
class Attention:
HEAD_COUNT = "{arch}.attention.head_count"
HEAD_COUNT_KV = "{arch}.attention.head_count_kv"
MAX_ALIBI_BIAS = "{arch}.attention.max_alibi_bias"
CLAMP_KQV = "{arch}.attention.clamp_kqv"
KEY_LENGTH = "{arch}.attention.key_length"
VALUE_LENGTH = "{arch}.attention.value_length"
LAYERNORM_EPS = "{arch}.attention.layer_norm_epsilon"
LAYERNORM_RMS_EPS = "{arch}.attention.layer_norm_rms_epsilon"
class Rope:
DIMENSION_COUNT = "{arch}.rope.dimension_count"
FREQ_BASE = "{arch}.rope.freq_base"
SCALING_TYPE = "{arch}.rope.scaling.type"
SCALING_FACTOR = "{arch}.rope.scaling.factor"
SCALING_ORIG_CTX_LEN = "{arch}.rope.scaling.original_context_length"
SCALING_FINETUNED = "{arch}.rope.scaling.finetuned"
class Tokenizer:
MODEL = "tokenizer.ggml.model"
LIST = "tokenizer.ggml.tokens"
TOKEN_TYPE = "tokenizer.ggml.token_type"
SCORES = "tokenizer.ggml.scores"
MERGES = "tokenizer.ggml.merges"
BOS_ID = "tokenizer.ggml.bos_token_id"
EOS_ID = "tokenizer.ggml.eos_token_id"
UNK_ID = "tokenizer.ggml.unknown_token_id"
SEP_ID = "tokenizer.ggml.seperator_token_id"
PAD_ID = "tokenizer.ggml.padding_token_id"
ADD_BOS = "tokenizer.ggml.add_bos_token"
ADD_EOS = "tokenizer.ggml.add_eos_token"
ADD_PREFIX = "tokenizer.ggml.add_space_prefix"
HF_JSON = "tokenizer.huggingface.json"
RWKV = "tokenizer.rwkv.world"
CHAT_TEMPLATE = "tokenizer.chat_template"
#
# recommended mapping of model tensor names for storage in gguf
#
class MODEL_ARCH(IntEnum):
LLAMA = auto()
FALCON = auto()
BAICHUAN = auto()
GPT2 = auto()
GPTJ = auto()
GPTNEOX = auto()
MPT = auto()
STARCODER = auto()
PERSIMMON = auto()
REFACT = auto()
BERT = auto()
BLOOM = auto()
STABLELM = auto()
QWEN = auto()
QWEN2 = auto()
PHI2 = auto()
PLAMO = auto()
CODESHELL = auto()
ORION = auto()
INTERNLM2 = auto()
MINICPM = auto()
class MODEL_TENSOR(IntEnum):
TOKEN_EMBD = auto()
TOKEN_EMBD_NORM = auto()
TOKEN_TYPES = auto()
POS_EMBD = auto()
OUTPUT = auto()
OUTPUT_NORM = auto()
ROPE_FREQS = auto()
ATTN_Q = auto()
ATTN_K = auto()
ATTN_V = auto()
ATTN_QKV = auto()
ATTN_OUT = auto()
ATTN_NORM = auto()
ATTN_NORM_2 = auto()
ATTN_ROT_EMBD = auto()
FFN_GATE_INP = auto()
FFN_NORM = auto()
FFN_GATE = auto()
FFN_DOWN = auto()
FFN_UP = auto()
FFN_ACT = auto()
FFN_GATE_EXP = auto()
FFN_DOWN_EXP = auto()
FFN_UP_EXP = auto()
ATTN_Q_NORM = auto()
ATTN_K_NORM = auto()
MODEL_ARCH_NAMES: dict[MODEL_ARCH, str] = {
MODEL_ARCH.LLAMA: "llama",
MODEL_ARCH.FALCON: "falcon",
MODEL_ARCH.BAICHUAN: "baichuan",
MODEL_ARCH.GPT2: "gpt2",
MODEL_ARCH.GPTJ: "gptj",
MODEL_ARCH.GPTNEOX: "gptneox",
MODEL_ARCH.MPT: "mpt",
MODEL_ARCH.STARCODER: "starcoder",
MODEL_ARCH.PERSIMMON: "persimmon",
MODEL_ARCH.REFACT: "refact",
MODEL_ARCH.BERT: "bert",
MODEL_ARCH.BLOOM: "bloom",
MODEL_ARCH.STABLELM: "stablelm",
MODEL_ARCH.QWEN: "qwen",
MODEL_ARCH.QWEN2: "qwen2",
MODEL_ARCH.PHI2: "phi2",
MODEL_ARCH.PLAMO: "plamo",
MODEL_ARCH.CODESHELL: "codeshell",
MODEL_ARCH.ORION: "orion",
MODEL_ARCH.INTERNLM2: "internlm2",
MODEL_ARCH.MINICPM: "minicpm",
}
TENSOR_NAMES: dict[MODEL_TENSOR, str] = {
MODEL_TENSOR.TOKEN_EMBD: "token_embd",
MODEL_TENSOR.TOKEN_EMBD_NORM: "token_embd_norm",
MODEL_TENSOR.TOKEN_TYPES: "token_types",
MODEL_TENSOR.POS_EMBD: "position_embd",
MODEL_TENSOR.OUTPUT_NORM: "output_norm",
MODEL_TENSOR.OUTPUT: "output",
MODEL_TENSOR.ROPE_FREQS: "rope_freqs",
MODEL_TENSOR.ATTN_NORM: "blk.{bid}.attn_norm",
MODEL_TENSOR.ATTN_NORM_2: "blk.{bid}.attn_norm_2",
MODEL_TENSOR.ATTN_QKV: "blk.{bid}.attn_qkv",
MODEL_TENSOR.ATTN_Q: "blk.{bid}.attn_q",
MODEL_TENSOR.ATTN_K: "blk.{bid}.attn_k",
MODEL_TENSOR.ATTN_V: "blk.{bid}.attn_v",
MODEL_TENSOR.ATTN_OUT: "blk.{bid}.attn_output",
MODEL_TENSOR.ATTN_ROT_EMBD: "blk.{bid}.attn_rot_embd",
MODEL_TENSOR.ATTN_Q_NORM: "blk.{bid}.attn_q_norm",
MODEL_TENSOR.ATTN_K_NORM: "blk.{bid}.attn_k_norm",
MODEL_TENSOR.FFN_GATE_INP: "blk.{bid}.ffn_gate_inp",
MODEL_TENSOR.FFN_NORM: "blk.{bid}.ffn_norm",
MODEL_TENSOR.FFN_GATE: "blk.{bid}.ffn_gate",
MODEL_TENSOR.FFN_DOWN: "blk.{bid}.ffn_down",
MODEL_TENSOR.FFN_UP: "blk.{bid}.ffn_up",
MODEL_TENSOR.FFN_ACT: "blk.{bid}.ffn",
MODEL_TENSOR.FFN_GATE_EXP: "blk.{bid}.ffn_gate.{xid}",
MODEL_TENSOR.FFN_DOWN_EXP: "blk.{bid}.ffn_down.{xid}",
MODEL_TENSOR.FFN_UP_EXP: "blk.{bid}.ffn_up.{xid}",
}
MODEL_TENSORS: dict[MODEL_ARCH, list[MODEL_TENSOR]] = {
MODEL_ARCH.LLAMA: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.OUTPUT,
MODEL_TENSOR.ROPE_FREQS,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_Q,
MODEL_TENSOR.ATTN_K,
MODEL_TENSOR.ATTN_V,
MODEL_TENSOR.ATTN_OUT,
MODEL_TENSOR.ATTN_ROT_EMBD,
MODEL_TENSOR.FFN_GATE_INP,
MODEL_TENSOR.FFN_NORM,
MODEL_TENSOR.FFN_GATE,
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
MODEL_TENSOR.FFN_GATE_EXP,
MODEL_TENSOR.FFN_DOWN_EXP,
MODEL_TENSOR.FFN_UP_EXP,
],
MODEL_ARCH.GPTNEOX: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.OUTPUT,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_QKV,
MODEL_TENSOR.ATTN_OUT,
MODEL_TENSOR.FFN_NORM,
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
],
MODEL_ARCH.FALCON: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.OUTPUT,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_NORM_2,
MODEL_TENSOR.ATTN_QKV,
MODEL_TENSOR.ATTN_OUT,
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
],
MODEL_ARCH.BAICHUAN: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.OUTPUT,
MODEL_TENSOR.ROPE_FREQS,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_Q,
MODEL_TENSOR.ATTN_K,
MODEL_TENSOR.ATTN_V,
MODEL_TENSOR.ATTN_OUT,
MODEL_TENSOR.ATTN_ROT_EMBD,
MODEL_TENSOR.FFN_NORM,
MODEL_TENSOR.FFN_GATE,
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
],
MODEL_ARCH.STARCODER: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.POS_EMBD,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.OUTPUT,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_QKV,
MODEL_TENSOR.ATTN_OUT,
MODEL_TENSOR.FFN_NORM,
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
],
MODEL_ARCH.BERT: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.TOKEN_TYPES,
MODEL_TENSOR.POS_EMBD,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_Q,
MODEL_TENSOR.ATTN_K,
MODEL_TENSOR.ATTN_V,
MODEL_TENSOR.ATTN_OUT,
MODEL_TENSOR.FFN_NORM,
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
],
MODEL_ARCH.MPT: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.OUTPUT,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_QKV,
MODEL_TENSOR.ATTN_OUT,
MODEL_TENSOR.FFN_NORM,
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
MODEL_TENSOR.FFN_ACT,
],
MODEL_ARCH.GPTJ: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.OUTPUT,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_Q,
MODEL_TENSOR.ATTN_K,
MODEL_TENSOR.ATTN_V,
MODEL_TENSOR.ATTN_OUT,
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
],
MODEL_ARCH.PERSIMMON: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.OUTPUT,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_QKV,
MODEL_TENSOR.ATTN_OUT,
MODEL_TENSOR.FFN_NORM,
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
MODEL_TENSOR.ATTN_Q_NORM,
MODEL_TENSOR.ATTN_K_NORM,
MODEL_TENSOR.ATTN_ROT_EMBD,
],
MODEL_ARCH.REFACT: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.OUTPUT,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_Q,
MODEL_TENSOR.ATTN_K,
MODEL_TENSOR.ATTN_V,
MODEL_TENSOR.ATTN_OUT,
MODEL_TENSOR.FFN_NORM,
MODEL_TENSOR.FFN_GATE,
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
],
MODEL_ARCH.BLOOM: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.TOKEN_EMBD_NORM,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.OUTPUT,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_QKV,
MODEL_TENSOR.ATTN_OUT,
MODEL_TENSOR.FFN_NORM,
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
],
MODEL_ARCH.STABLELM: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.OUTPUT,
MODEL_TENSOR.ROPE_FREQS,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_Q,
MODEL_TENSOR.ATTN_K,
MODEL_TENSOR.ATTN_V,
MODEL_TENSOR.ATTN_OUT,
MODEL_TENSOR.FFN_NORM,
MODEL_TENSOR.FFN_GATE,
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
],
MODEL_ARCH.QWEN: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.OUTPUT,
MODEL_TENSOR.ROPE_FREQS,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_QKV,
MODEL_TENSOR.ATTN_OUT,
MODEL_TENSOR.ATTN_ROT_EMBD,
MODEL_TENSOR.FFN_NORM,
MODEL_TENSOR.FFN_GATE,
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
],
MODEL_ARCH.QWEN2: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.OUTPUT,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_Q,
MODEL_TENSOR.ATTN_K,
MODEL_TENSOR.ATTN_V,
MODEL_TENSOR.ATTN_OUT,
MODEL_TENSOR.FFN_NORM,
MODEL_TENSOR.FFN_GATE,
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
],
MODEL_ARCH.PLAMO: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.OUTPUT,
MODEL_TENSOR.ROPE_FREQS,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_Q,
MODEL_TENSOR.ATTN_K,
MODEL_TENSOR.ATTN_V,
MODEL_TENSOR.ATTN_OUT,
MODEL_TENSOR.ATTN_ROT_EMBD,
MODEL_TENSOR.FFN_GATE,
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
],
MODEL_ARCH.GPT2: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.POS_EMBD,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.OUTPUT,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_QKV,
MODEL_TENSOR.ATTN_OUT,
MODEL_TENSOR.FFN_NORM,
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
],
MODEL_ARCH.PHI2: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.OUTPUT,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_QKV,
MODEL_TENSOR.ATTN_Q,
MODEL_TENSOR.ATTN_K,
MODEL_TENSOR.ATTN_V,
MODEL_TENSOR.ATTN_OUT,
MODEL_TENSOR.FFN_NORM,
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
],
MODEL_ARCH.CODESHELL: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.POS_EMBD,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.OUTPUT,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_QKV,
MODEL_TENSOR.ATTN_OUT,
MODEL_TENSOR.ATTN_ROT_EMBD,
MODEL_TENSOR.FFN_NORM,
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
],
MODEL_ARCH.ORION: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.OUTPUT,
MODEL_TENSOR.ROPE_FREQS,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_Q,
MODEL_TENSOR.ATTN_K,
MODEL_TENSOR.ATTN_V,
MODEL_TENSOR.ATTN_OUT,
MODEL_TENSOR.ATTN_ROT_EMBD,
MODEL_TENSOR.FFN_NORM,
MODEL_TENSOR.FFN_GATE,
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
],
MODEL_ARCH.INTERNLM2: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.OUTPUT,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_Q,
MODEL_TENSOR.ATTN_K,
MODEL_TENSOR.ATTN_V,
MODEL_TENSOR.ATTN_OUT,
MODEL_TENSOR.ATTN_ROT_EMBD,
MODEL_TENSOR.FFN_NORM,
MODEL_TENSOR.FFN_GATE,
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
],
MODEL_ARCH.MINICPM: [
MODEL_TENSOR.TOKEN_EMBD,
MODEL_TENSOR.OUTPUT_NORM,
MODEL_TENSOR.ROPE_FREQS,
MODEL_TENSOR.ATTN_NORM,
MODEL_TENSOR.ATTN_Q,
MODEL_TENSOR.ATTN_K,
MODEL_TENSOR.ATTN_V,
MODEL_TENSOR.ATTN_OUT,
MODEL_TENSOR.ATTN_ROT_EMBD,
MODEL_TENSOR.FFN_GATE_INP,
MODEL_TENSOR.FFN_NORM,
MODEL_TENSOR.FFN_GATE,
MODEL_TENSOR.FFN_DOWN,
MODEL_TENSOR.FFN_UP,
MODEL_TENSOR.FFN_GATE_EXP,
MODEL_TENSOR.FFN_DOWN_EXP,
MODEL_TENSOR.FFN_UP_EXP,
],
# TODO
}
# tensors that will not be serialized
MODEL_TENSOR_SKIP: dict[MODEL_ARCH, list[MODEL_TENSOR]] = {
MODEL_ARCH.LLAMA: [
MODEL_TENSOR.ROPE_FREQS,
MODEL_TENSOR.ATTN_ROT_EMBD,
],
MODEL_ARCH.BAICHUAN: [
MODEL_TENSOR.ROPE_FREQS,
MODEL_TENSOR.ATTN_ROT_EMBD,
],
MODEL_ARCH.PERSIMMON: [
MODEL_TENSOR.ROPE_FREQS,
],
MODEL_ARCH.QWEN: [
MODEL_TENSOR.ROPE_FREQS,
MODEL_TENSOR.ATTN_ROT_EMBD,
],
MODEL_ARCH.CODESHELL: [
MODEL_TENSOR.ROPE_FREQS,
MODEL_TENSOR.ATTN_ROT_EMBD,
],
MODEL_ARCH.ORION: [
MODEL_TENSOR.ROPE_FREQS,
MODEL_TENSOR.ATTN_ROT_EMBD,
],
}
#
# types
#
class TokenType(IntEnum):
NORMAL = 1
UNKNOWN = 2
CONTROL = 3
USER_DEFINED = 4
UNUSED = 5
BYTE = 6
class RopeScalingType(Enum):
NONE = 'none'
LINEAR = 'linear'
YARN = 'yarn'
class GGMLQuantizationType(IntEnum):
F32 = 0
F16 = 1
Q4_0 = 2
Q4_1 = 3
Q5_0 = 6
Q5_1 = 7
Q8_0 = 8
Q8_1 = 9
Q2_K = 10
Q3_K = 11
Q4_K = 12
Q5_K = 13
Q6_K = 14
Q8_K = 15
class GGUFEndian(IntEnum):
LITTLE = 0
BIG = 1
class GGUFValueType(IntEnum):
UINT8 = 0
INT8 = 1
UINT16 = 2
INT16 = 3
UINT32 = 4
INT32 = 5
FLOAT32 = 6
BOOL = 7
STRING = 8
ARRAY = 9
UINT64 = 10
INT64 = 11
FLOAT64 = 12
@staticmethod
def get_type(val: Any) -> GGUFValueType:
if isinstance(val, (str, bytes, bytearray)):
return GGUFValueType.STRING
elif isinstance(val, list):
return GGUFValueType.ARRAY
elif isinstance(val, float):
return GGUFValueType.FLOAT32
elif isinstance(val, bool):
return GGUFValueType.BOOL
elif isinstance(val, int):
return GGUFValueType.INT32
# TODO: need help with 64-bit types in Python
else:
print("Unknown type:", type(val))
sys.exit()
# Note: Does not support GGML_QKK_64
QK_K = 256
# Items here are (block size, type size)
GGML_QUANT_SIZES = {
GGMLQuantizationType.F32: (1, 4),
GGMLQuantizationType.F16: (1, 2),
GGMLQuantizationType.Q4_0: (32, 2 + 16),
GGMLQuantizationType.Q4_1: (32, 2 + 2 + 16),
GGMLQuantizationType.Q5_0: (32, 2 + 4 + 16),
GGMLQuantizationType.Q5_1: (32, 2 + 2 + 4 + 16),
GGMLQuantizationType.Q8_0: (32, 2 + 32),
GGMLQuantizationType.Q8_1: (32, 4 + 4 + 32),
GGMLQuantizationType.Q2_K: (256, 2 + 2 + QK_K // 16 + QK_K // 4),
GGMLQuantizationType.Q3_K: (256, 2 + QK_K // 4 + QK_K // 8 + 12),
GGMLQuantizationType.Q4_K: (256, 2 + 2 + QK_K // 2 + 12),
GGMLQuantizationType.Q5_K: (256, 2 + 2 + QK_K // 2 + QK_K // 8 + 12),
GGMLQuantizationType.Q6_K: (256, 2 + QK_K // 2 + QK_K // 4 + QK_K // 16),
GGMLQuantizationType.Q8_K: (256, 4 + QK_K + QK_K // 8),
}
# Aliases for backward compatibility.
# general
KEY_GENERAL_ARCHITECTURE = Keys.General.ARCHITECTURE
KEY_GENERAL_QUANTIZATION_VERSION = Keys.General.QUANTIZATION_VERSION
KEY_GENERAL_ALIGNMENT = Keys.General.ALIGNMENT
KEY_GENERAL_NAME = Keys.General.NAME
KEY_GENERAL_AUTHOR = Keys.General.AUTHOR
KEY_GENERAL_URL = Keys.General.URL
KEY_GENERAL_DESCRIPTION = Keys.General.DESCRIPTION
KEY_GENERAL_LICENSE = Keys.General.LICENSE
KEY_GENERAL_SOURCE_URL = Keys.General.SOURCE_URL
KEY_GENERAL_SOURCE_HF_REPO = Keys.General.SOURCE_HF_REPO
KEY_GENERAL_FILE_TYPE = Keys.General.FILE_TYPE
# LLM
KEY_CONTEXT_LENGTH = Keys.LLM.CONTEXT_LENGTH
KEY_EMBEDDING_LENGTH = Keys.LLM.EMBEDDING_LENGTH
KEY_BLOCK_COUNT = Keys.LLM.BLOCK_COUNT
KEY_FEED_FORWARD_LENGTH = Keys.LLM.FEED_FORWARD_LENGTH
KEY_USE_PARALLEL_RESIDUAL = Keys.LLM.USE_PARALLEL_RESIDUAL
KEY_TENSOR_DATA_LAYOUT = Keys.LLM.TENSOR_DATA_LAYOUT
# attention
KEY_ATTENTION_HEAD_COUNT = Keys.Attention.HEAD_COUNT
KEY_ATTENTION_HEAD_COUNT_KV = Keys.Attention.HEAD_COUNT_KV
KEY_ATTENTION_MAX_ALIBI_BIAS = Keys.Attention.MAX_ALIBI_BIAS
KEY_ATTENTION_CLAMP_KQV = Keys.Attention.CLAMP_KQV
KEY_ATTENTION_LAYERNORM_EPS = Keys.Attention.LAYERNORM_EPS
KEY_ATTENTION_LAYERNORM_RMS_EPS = Keys.Attention.LAYERNORM_RMS_EPS
# RoPE
KEY_ROPE_DIMENSION_COUNT = Keys.Rope.DIMENSION_COUNT
KEY_ROPE_FREQ_BASE = Keys.Rope.FREQ_BASE
KEY_ROPE_SCALING_TYPE = Keys.Rope.SCALING_TYPE
KEY_ROPE_SCALING_FACTOR = Keys.Rope.SCALING_FACTOR
KEY_ROPE_SCALING_ORIG_CTX_LEN = Keys.Rope.SCALING_ORIG_CTX_LEN
KEY_ROPE_SCALING_FINETUNED = Keys.Rope.SCALING_FINETUNED
# tokenization
KEY_TOKENIZER_MODEL = Keys.Tokenizer.MODEL
KEY_TOKENIZER_LIST = Keys.Tokenizer.LIST
KEY_TOKENIZER_TOKEN_TYPE = Keys.Tokenizer.TOKEN_TYPE
KEY_TOKENIZER_SCORES = Keys.Tokenizer.SCORES
KEY_TOKENIZER_MERGES = Keys.Tokenizer.MERGES
KEY_TOKENIZER_BOS_ID = Keys.Tokenizer.BOS_ID
KEY_TOKENIZER_EOS_ID = Keys.Tokenizer.EOS_ID
KEY_TOKENIZER_UNK_ID = Keys.Tokenizer.UNK_ID
KEY_TOKENIZER_SEP_ID = Keys.Tokenizer.SEP_ID
KEY_TOKENIZER_PAD_ID = Keys.Tokenizer.PAD_ID
KEY_TOKENIZER_HF_JSON = Keys.Tokenizer.HF_JSON
KEY_TOKENIZER_RWKV = Keys.Tokenizer.RWKV

View File

@ -1,15 +0,0 @@
# This file left for compatibility. If you want to use the GGUF API from Python
# then don't import gguf/gguf.py directly. If you're looking for examples, see the
# examples/ directory for gguf-py
import importlib
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent))
# Compatibility for people trying to import gguf/gguf.py directly instead of as a package.
importlib.invalidate_caches()
import gguf # noqa: E402
importlib.reload(gguf)

View File

@ -1,264 +0,0 @@
#
# GGUF file reading/modification support. For API usage information,
# please see the files scripts/ for some fairly simple examples.
#
from __future__ import annotations
import os
from collections import OrderedDict
from typing import Any, Literal, NamedTuple, TypeVar, Union
import numpy as np
import numpy.typing as npt
if __name__ == "__main__":
import sys
from pathlib import Path
# Allow running file in package as a script.
sys.path.insert(0, str(Path(__file__).parent.parent))
from gguf.constants import (
GGML_QUANT_SIZES,
GGUF_DEFAULT_ALIGNMENT,
GGUF_MAGIC,
GGUF_VERSION,
GGMLQuantizationType,
GGUFValueType,
)
READER_SUPPORTED_VERSIONS = [2, GGUF_VERSION]
class ReaderField(NamedTuple):
# Offset to start of this field.
offset: int
# Name of the field (not necessarily from file data).
name: str
# Data parts. Some types have multiple components, such as strings
# that consist of a length followed by the string data.
parts: list[npt.NDArray[Any]] = []
# Indexes into parts that we can call the actual data. For example
# an array of strings will be populated with indexes to the actual
# string data.
data: list[int] = [-1]
types: list[GGUFValueType] = []
class ReaderTensor(NamedTuple):
name: str
tensor_type: GGMLQuantizationType
shape: npt.NDArray[np.uint32]
n_elements: int
n_bytes: int
data_offset: int
data: npt.NDArray[Any]
field: ReaderField
class GGUFReader:
# I - same as host, S - swapped
byte_order: Literal['I' | 'S'] = 'I'
alignment: int = GGUF_DEFAULT_ALIGNMENT
# Note: Internal helper, API may change.
gguf_scalar_to_np: dict[GGUFValueType, type[np.generic]] = {
GGUFValueType.UINT8: np.uint8,
GGUFValueType.INT8: np.int8,
GGUFValueType.UINT16: np.uint16,
GGUFValueType.INT16: np.int16,
GGUFValueType.UINT32: np.uint32,
GGUFValueType.INT32: np.int32,
GGUFValueType.FLOAT32: np.float32,
GGUFValueType.UINT64: np.uint64,
GGUFValueType.INT64: np.int64,
GGUFValueType.FLOAT64: np.float64,
GGUFValueType.BOOL: np.bool_,
}
def __init__(self, path: os.PathLike[str] | str, mode: Literal['r' | 'r+' | 'c'] = 'r'):
self.data = np.memmap(path, mode = mode)
offs = 0
if self._get(offs, np.uint32, override_order = '<')[0] != GGUF_MAGIC:
raise ValueError('GGUF magic invalid')
offs += 4
temp_version = self._get(offs, np.uint32)
if temp_version[0] & 65535 == 0:
# If we get 0 here that means it's (probably) a GGUF file created for
# the opposite byte order of the machine this script is running on.
self.byte_order = 'S'
temp_version = temp_version.newbyteorder(self.byte_order)
version = temp_version[0]
if version not in READER_SUPPORTED_VERSIONS:
raise ValueError(f'Sorry, file appears to be version {version} which we cannot handle')
self.fields: OrderedDict[str, ReaderField] = OrderedDict()
self.tensors: list[ReaderTensor] = []
offs += self._push_field(ReaderField(offs, 'GGUF.version', [temp_version], [0], [GGUFValueType.UINT32]))
temp_counts = self._get(offs, np.uint64, 2)
offs += self._push_field(ReaderField(offs, 'GGUF.tensor_count', [temp_counts[:1]], [0], [GGUFValueType.UINT64]))
offs += self._push_field(ReaderField(offs, 'GGUF.kv_count', [temp_counts[1:]], [0], [GGUFValueType.UINT64]))
tensor_count, kv_count = temp_counts
offs = self._build_fields(offs, kv_count)
offs, tensors_fields = self._build_tensors_fields(offs, tensor_count)
new_align = self.fields.get('general.alignment')
if new_align is not None:
if new_align.types != [GGUFValueType.UINT32]:
raise ValueError('Bad type for general.alignment field')
self.alignment = new_align.parts[-1][0]
padding = offs % self.alignment
if padding != 0:
offs += self.alignment - padding
self._build_tensors(offs, tensors_fields)
_DT = TypeVar('_DT', bound = npt.DTypeLike)
# Fetch a key/value metadata field by key.
def get_field(self, key: str) -> Union[ReaderField, None]:
return self.fields.get(key, None)
# Fetch a tensor from the list by index.
def get_tensor(self, idx: int) -> ReaderTensor:
return self.tensors[idx]
def _get(
self, offset: int, dtype: npt.DTypeLike, count: int = 1, override_order: None | Literal['I' | 'S' | '<'] = None,
) -> npt.NDArray[Any]:
count = int(count)
itemsize = int(np.empty([], dtype = dtype).itemsize)
end_offs = offset + itemsize * count
return (
self.data[offset:end_offs]
.view(dtype = dtype)[:count]
.newbyteorder(override_order or self.byte_order)
)
def _push_field(self, field: ReaderField, skip_sum: bool = False) -> int:
if field.name in self.fields:
raise KeyError(f'Duplicate {field.name} already in list at offset {field.offset}')
self.fields[field.name] = field
return 0 if skip_sum else sum(int(part.nbytes) for part in field.parts)
def _get_str(self, offset: int) -> tuple[npt.NDArray[np.uint64], npt.NDArray[np.uint8]]:
slen = self._get(offset, np.uint64)
return slen, self._get(offset + 8, np.uint8, slen[0])
def _get_field_parts(
self, orig_offs: int, raw_type: int,
) -> tuple[int, list[npt.NDArray[Any]], list[int], list[GGUFValueType]]:
offs = orig_offs
types: list[GGUFValueType] = []
gtype = GGUFValueType(raw_type)
types.append(gtype)
# Handle strings.
if gtype == GGUFValueType.STRING:
sparts: list[npt.NDArray[Any]] = list(self._get_str(offs))
size = sum(int(part.nbytes) for part in sparts)
return size, sparts, [1], types
# Check if it's a simple scalar type.
nptype = self.gguf_scalar_to_np.get(gtype)
if nptype is not None:
val = self._get(offs, nptype)
return int(val.nbytes), [val], [0], types
# Handle arrays.
if gtype == GGUFValueType.ARRAY:
raw_itype = self._get(offs, np.uint32)
offs += int(raw_itype.nbytes)
alen = self._get(offs, np.uint64)
offs += int(alen.nbytes)
aparts: list[npt.NDArray[Any]] = [raw_itype, alen]
data_idxs: list[int] = []
for idx in range(alen[0]):
curr_size, curr_parts, curr_idxs, curr_types = self._get_field_parts(offs, raw_itype[0])
if idx == 0:
types += curr_types
idxs_offs = len(aparts)
aparts += curr_parts
data_idxs += (idx + idxs_offs for idx in curr_idxs)
offs += curr_size
return offs - orig_offs, aparts, data_idxs, types
# We can't deal with this one.
raise ValueError('Unknown/unhandled field type {gtype}')
def _get_tensor(self, orig_offs: int) -> ReaderField:
offs = orig_offs
name_len, name_data = self._get_str(offs)
offs += int(name_len.nbytes + name_data.nbytes)
n_dims = self._get(offs, np.uint32)
offs += int(n_dims.nbytes)
dims = self._get(offs, np.uint64, n_dims[0])
offs += int(dims.nbytes)
raw_dtype = self._get(offs, np.uint32)
offs += int(raw_dtype.nbytes)
offset_tensor = self._get(offs, np.uint64)
offs += int(offset_tensor.nbytes)
return ReaderField(
orig_offs,
str(bytes(name_data), encoding = 'utf-8'),
[name_len, name_data, n_dims, dims, raw_dtype, offset_tensor],
[1, 3, 4, 5],
)
def _build_fields(self, offs: int, count: int) -> int:
for _ in range(count):
orig_offs = offs
kv_klen, kv_kdata = self._get_str(offs)
offs += int(kv_klen.nbytes + kv_kdata.nbytes)
raw_kv_type = self._get(offs, np.uint32)
offs += int(raw_kv_type.nbytes)
parts: list[npt.NDArray[Any]] = [kv_klen, kv_kdata, raw_kv_type]
idxs_offs = len(parts)
field_size, field_parts, field_idxs, field_types = self._get_field_parts(offs, raw_kv_type[0])
parts += field_parts
self._push_field(ReaderField(
orig_offs,
str(bytes(kv_kdata), encoding = 'utf-8'),
parts,
[idx + idxs_offs for idx in field_idxs],
field_types,
), skip_sum = True)
offs += field_size
return offs
def _build_tensors_fields(self, offs: int, count: int) -> tuple[int, list[ReaderField]]:
tensor_fields = []
for _ in range(count):
field = self._get_tensor(offs)
offs += sum(int(part.nbytes) for part in field.parts)
tensor_fields.append(field)
return offs, tensor_fields
def _build_tensors(self, start_offs: int, fields: list[ReaderField]) -> None:
tensors = []
for field in fields:
_name_len, name_data, _n_dims, dims, raw_dtype, offset_tensor = field.parts
ggml_type = GGMLQuantizationType(raw_dtype[0])
n_elems = np.prod(dims)
block_size, type_size = GGML_QUANT_SIZES[ggml_type]
n_bytes = n_elems * type_size // block_size
data_offs = int(start_offs + offset_tensor[0])
item_type: npt.DTypeLike
if ggml_type == GGMLQuantizationType.F32:
item_count = n_elems
item_type = np.float32
elif ggml_type == GGMLQuantizationType.F16:
item_count = n_elems
item_type = np.float16
else:
item_count = n_bytes
item_type = np.uint8
tensors.append(ReaderTensor(
name = str(bytes(name_data), encoding = 'utf-8'),
tensor_type = ggml_type,
shape = dims,
n_elements = n_elems,
n_bytes = n_bytes,
data_offset = data_offs,
data = self._get(data_offs, item_type, item_count),
field = field,
))
self.tensors = tensors

View File

@ -1,427 +0,0 @@
from __future__ import annotations
import os
import shutil
import struct
import tempfile
from enum import Enum, auto
from io import BufferedWriter
from typing import IO, Any, Sequence
import numpy as np
from .constants import (
GGUF_DEFAULT_ALIGNMENT,
GGUF_MAGIC,
GGUF_VERSION,
GGMLQuantizationType,
GGUFEndian,
GGUFValueType,
Keys,
RopeScalingType,
TokenType,
)
class WriterState(Enum):
EMPTY = auto()
HEADER = auto()
KV_DATA = auto()
TI_DATA = auto()
class GGUFWriter:
fout: BufferedWriter
temp_file: tempfile.SpooledTemporaryFile[bytes] | None
tensors: list[np.ndarray[Any, Any]]
_simple_value_packing = {
GGUFValueType.UINT8: "B",
GGUFValueType.INT8: "b",
GGUFValueType.UINT16: "H",
GGUFValueType.INT16: "h",
GGUFValueType.UINT32: "I",
GGUFValueType.INT32: "i",
GGUFValueType.FLOAT32: "f",
GGUFValueType.UINT64: "Q",
GGUFValueType.INT64: "q",
GGUFValueType.FLOAT64: "d",
GGUFValueType.BOOL: "?",
}
def __init__(
self, path: os.PathLike[str] | str, arch: str, use_temp_file: bool = True,
endianess: GGUFEndian = GGUFEndian.LITTLE,
):
self.fout = open(path, "wb")
self.arch = arch
self.endianess = endianess
self.offset_tensor = 0
self.data_alignment = GGUF_DEFAULT_ALIGNMENT
self.kv_data = bytearray()
self.kv_data_count = 0
self.ti_data = bytearray()
self.ti_data_count = 0
self.use_temp_file = use_temp_file
self.temp_file = None
self.tensors = []
print("gguf: This GGUF file is for {0} Endian only".format(
"Big" if self.endianess == GGUFEndian.BIG else "Little",
))
self.state = WriterState.EMPTY
self.add_architecture()
def write_header_to_file(self) -> None:
if self.state is not WriterState.EMPTY:
raise ValueError(f'Expected output file to be empty, got {self.state}')
self._write_packed("<I", GGUF_MAGIC, skip_pack_prefix = True)
self._write_packed("I", GGUF_VERSION)
self._write_packed("Q", self.ti_data_count)
self._write_packed("Q", self.kv_data_count)
self.flush()
self.state = WriterState.HEADER
def write_kv_data_to_file(self) -> None:
if self.state is not WriterState.HEADER:
raise ValueError(f'Expected output file to contain the header, got {self.state}')
self.fout.write(self.kv_data)
self.flush()
self.state = WriterState.KV_DATA
def write_ti_data_to_file(self) -> None:
if self.state is not WriterState.KV_DATA:
raise ValueError(f'Expected output file to contain KV data, got {self.state}')
self.fout.write(self.ti_data)
self.flush()
self.state = WriterState.TI_DATA
def add_key(self, key: str) -> None:
self.add_val(key, GGUFValueType.STRING, add_vtype=False)
def add_uint8(self, key: str, val: int) -> None:
self.add_key(key)
self.add_val(val, GGUFValueType.UINT8)
def add_int8(self, key: str, val: int) -> None:
self.add_key(key)
self.add_val(val, GGUFValueType.INT8)
def add_uint16(self, key: str, val: int) -> None:
self.add_key(key)
self.add_val(val, GGUFValueType.UINT16)
def add_int16(self, key: str, val: int) -> None:
self.add_key(key)
self.add_val(val, GGUFValueType.INT16)
def add_uint32(self, key: str, val: int) -> None:
self.add_key(key)
self.add_val(val, GGUFValueType.UINT32)
def add_int32(self, key: str, val: int) -> None:
self.add_key(key)
self.add_val(val, GGUFValueType.INT32)
def add_float32(self, key: str, val: float) -> None:
self.add_key(key)
self.add_val(val, GGUFValueType.FLOAT32)
def add_uint64(self, key: str, val: int) -> None:
self.add_key(key)
self.add_val(val, GGUFValueType.UINT64)
def add_int64(self, key: str, val: int) -> None:
self.add_key(key)
self.add_val(val, GGUFValueType.INT64)
def add_float64(self, key: str, val: float) -> None:
self.add_key(key)
self.add_val(val, GGUFValueType.FLOAT64)
def add_bool(self, key: str, val: bool) -> None:
self.add_key(key)
self.add_val(val, GGUFValueType.BOOL)
def add_string(self, key: str, val: str) -> None:
if not val:
return
self.add_key(key)
self.add_val(val, GGUFValueType.STRING)
def add_array(self, key: str, val: Sequence[Any]) -> None:
if not isinstance(val, Sequence):
raise ValueError("Value must be a sequence for array type")
self.add_key(key)
self.add_val(val, GGUFValueType.ARRAY)
def add_val(self, val: Any, vtype: GGUFValueType | None = None, add_vtype: bool = True) -> None:
if vtype is None:
vtype = GGUFValueType.get_type(val)
if add_vtype:
self.kv_data += self._pack("I", vtype)
self.kv_data_count += 1
pack_fmt = self._simple_value_packing.get(vtype)
if pack_fmt is not None:
self.kv_data += self._pack(pack_fmt, val, skip_pack_prefix = vtype == GGUFValueType.BOOL)
elif vtype == GGUFValueType.STRING:
encoded_val = val.encode("utf8") if isinstance(val, str) else val
self.kv_data += self._pack("Q", len(encoded_val))
self.kv_data += encoded_val
elif vtype == GGUFValueType.ARRAY and isinstance(val, Sequence) and val:
ltype = GGUFValueType.get_type(val[0])
if not all(GGUFValueType.get_type(i) is ltype for i in val[1:]):
raise ValueError("All items in a GGUF array should be of the same type")
self.kv_data += self._pack("I", ltype)
self.kv_data += self._pack("Q", len(val))
for item in val:
self.add_val(item, add_vtype=False)
else:
raise ValueError("Invalid GGUF metadata value type or value")
@staticmethod
def ggml_pad(x: int, n: int) -> int:
return ((x + n - 1) // n) * n
def add_tensor_info(
self, name: str, tensor_shape: Sequence[int], tensor_dtype: np.dtype[np.float16] | np.dtype[np.float32],
tensor_nbytes: int, raw_dtype: GGMLQuantizationType | None = None,
) -> None:
if self.state is not WriterState.EMPTY:
raise ValueError(f'Expected output file to be empty, got {self.state}')
if raw_dtype is None and tensor_dtype not in (np.float32, np.float16):
raise ValueError("Only F32 and F16 tensors are supported for now")
encoded_name = name.encode("utf8")
self.ti_data += self._pack("Q", len(encoded_name))
self.ti_data += encoded_name
n_dims = len(tensor_shape)
self.ti_data += self._pack("I", n_dims)
for i in range(n_dims):
self.ti_data += self._pack("Q", tensor_shape[n_dims - 1 - i])
if raw_dtype is None:
dtype = GGMLQuantizationType.F32 if tensor_dtype == np.float32 else GGMLQuantizationType.F16
else:
dtype = raw_dtype
self.ti_data += self._pack("I", dtype)
self.ti_data += self._pack("Q", self.offset_tensor)
self.offset_tensor += GGUFWriter.ggml_pad(tensor_nbytes, self.data_alignment)
self.ti_data_count += 1
def add_tensor(
self, name: str, tensor: np.ndarray[Any, Any], raw_shape: Sequence[int] | None = None,
raw_dtype: GGMLQuantizationType | None = None,
) -> None:
if self.endianess == GGUFEndian.BIG:
tensor.byteswap(inplace=True)
if self.use_temp_file and self.temp_file is None:
fp = tempfile.SpooledTemporaryFile(mode="w+b", max_size=256 * 1024 * 1024)
fp.seek(0)
self.temp_file = fp
shape: Sequence[int] = raw_shape if raw_shape is not None else tensor.shape
self.add_tensor_info(name, shape, tensor.dtype, tensor.nbytes, raw_dtype = raw_dtype)
if self.temp_file is None:
self.tensors.append(tensor)
return
tensor.tofile(self.temp_file)
self.write_padding(self.temp_file, tensor.nbytes)
def write_padding(self, fp: IO[bytes], n: int, align: int | None = None) -> None:
pad = GGUFWriter.ggml_pad(n, align if align is not None else self.data_alignment) - n
if pad != 0:
fp.write(bytes([0] * pad))
def write_tensor_data(self, tensor: np.ndarray[Any, Any]) -> None:
if self.state is not WriterState.TI_DATA:
raise ValueError(f'Expected output file to contain tensor info, got {self.state}')
if self.endianess == GGUFEndian.BIG:
tensor.byteswap(inplace=True)
self.write_padding(self.fout, self.fout.tell())
tensor.tofile(self.fout)
self.write_padding(self.fout, tensor.nbytes)
def write_tensors_to_file(self) -> None:
self.write_ti_data_to_file()
self.write_padding(self.fout, self.fout.tell())
if self.temp_file is None:
while True:
try:
tensor = self.tensors.pop(0)
except IndexError:
break
tensor.tofile(self.fout)
self.write_padding(self.fout, tensor.nbytes)
return
self.temp_file.seek(0)
shutil.copyfileobj(self.temp_file, self.fout)
self.flush()
self.temp_file.close()
def flush(self) -> None:
self.fout.flush()
def close(self) -> None:
self.fout.close()
def add_architecture(self) -> None:
self.add_string(Keys.General.ARCHITECTURE, self.arch)
def add_author(self, author: str) -> None:
self.add_string(Keys.General.AUTHOR, author)
def add_tensor_data_layout(self, layout: str) -> None:
self.add_string(Keys.LLM.TENSOR_DATA_LAYOUT.format(arch=self.arch), layout)
def add_url(self, url: str) -> None:
self.add_string(Keys.General.URL, url)
def add_description(self, description: str) -> None:
self.add_string(Keys.General.DESCRIPTION, description)
def add_source_url(self, url: str) -> None:
self.add_string(Keys.General.SOURCE_URL, url)
def add_source_hf_repo(self, repo: str) -> None:
self.add_string(Keys.General.SOURCE_HF_REPO, repo)
def add_file_type(self, ftype: int) -> None:
self.add_uint32(Keys.General.FILE_TYPE, ftype)
def add_name(self, name: str) -> None:
self.add_string(Keys.General.NAME, name)
def add_quantization_version(self, quantization_version: GGMLQuantizationType) -> None:
self.add_uint32(
Keys.General.QUANTIZATION_VERSION, quantization_version)
def add_custom_alignment(self, alignment: int) -> None:
self.data_alignment = alignment
self.add_uint32(Keys.General.ALIGNMENT, alignment)
def add_context_length(self, length: int) -> None:
self.add_uint32(Keys.LLM.CONTEXT_LENGTH.format(arch=self.arch), length)
def add_embedding_length(self, length: int) -> None:
self.add_uint32(Keys.LLM.EMBEDDING_LENGTH.format(arch=self.arch), length)
def add_block_count(self, length: int) -> None:
self.add_uint32(Keys.LLM.BLOCK_COUNT.format(arch=self.arch), length)
def add_feed_forward_length(self, length: int) -> None:
self.add_uint32(Keys.LLM.FEED_FORWARD_LENGTH.format(arch=self.arch), length)
def add_parallel_residual(self, use: bool) -> None:
self.add_bool(Keys.LLM.USE_PARALLEL_RESIDUAL.format(arch=self.arch), use)
def add_head_count(self, count: int) -> None:
self.add_uint32(Keys.Attention.HEAD_COUNT.format(arch=self.arch), count)
def add_head_count_kv(self, count: int) -> None:
self.add_uint32(Keys.Attention.HEAD_COUNT_KV.format(arch=self.arch), count)
def add_key_length(self, length: int) -> None:
self.add_uint32(Keys.Attention.KEY_LENGTH.format(arch=self.arch), length)
def add_value_length(self, length: int) -> None:
self.add_uint32(Keys.Attention.VALUE_LENGTH.format(arch=self.arch), length)
def add_max_alibi_bias(self, bias: float) -> None:
self.add_float32(Keys.Attention.MAX_ALIBI_BIAS.format(arch=self.arch), bias)
def add_clamp_kqv(self, value: float) -> None:
self.add_float32(Keys.Attention.CLAMP_KQV.format(arch=self.arch), value)
def add_expert_count(self, count: int) -> None:
self.add_uint32(Keys.LLM.EXPERT_COUNT.format(arch=self.arch), count)
def add_expert_used_count(self, count: int) -> None:
self.add_uint32(Keys.LLM.EXPERT_USED_COUNT.format(arch=self.arch), count)
def add_layer_norm_eps(self, value: float) -> None:
self.add_float32(Keys.Attention.LAYERNORM_EPS.format(arch=self.arch), value)
def add_layer_norm_rms_eps(self, value: float) -> None:
self.add_float32(Keys.Attention.LAYERNORM_RMS_EPS.format(arch=self.arch), value)
def add_rope_dimension_count(self, count: int) -> None:
self.add_uint32(Keys.Rope.DIMENSION_COUNT.format(arch=self.arch), count)
def add_rope_freq_base(self, value: float) -> None:
self.add_float32(Keys.Rope.FREQ_BASE.format(arch=self.arch), value)
def add_rope_scaling_type(self, value: RopeScalingType) -> None:
self.add_string(Keys.Rope.SCALING_TYPE.format(arch=self.arch), value.value)
def add_rope_scaling_factor(self, value: float) -> None:
self.add_float32(Keys.Rope.SCALING_FACTOR.format(arch=self.arch), value)
def add_rope_scaling_orig_ctx_len(self, value: int) -> None:
self.add_uint32(Keys.Rope.SCALING_ORIG_CTX_LEN.format(arch=self.arch), value)
def add_rope_scaling_finetuned(self, value: bool) -> None:
self.add_bool(Keys.Rope.SCALING_FINETUNED.format(arch=self.arch), value)
def add_tokenizer_model(self, model: str) -> None:
self.add_string(Keys.Tokenizer.MODEL, model)
def add_token_list(self, tokens: Sequence[str] | Sequence[bytes] | Sequence[bytearray]) -> None:
self.add_array(Keys.Tokenizer.LIST, tokens)
def add_token_merges(self, merges: Sequence[str] | Sequence[bytes] | Sequence[bytearray]) -> None:
self.add_array(Keys.Tokenizer.MERGES, merges)
def add_token_types(self, types: Sequence[TokenType] | Sequence[int]) -> None:
self.add_array(Keys.Tokenizer.TOKEN_TYPE, types)
def add_token_scores(self, scores: Sequence[float]) -> None:
self.add_array(Keys.Tokenizer.SCORES, scores)
def add_bos_token_id(self, id: int) -> None:
self.add_uint32(Keys.Tokenizer.BOS_ID, id)
def add_eos_token_id(self, id: int) -> None:
self.add_uint32(Keys.Tokenizer.EOS_ID, id)
def add_unk_token_id(self, id: int) -> None:
self.add_uint32(Keys.Tokenizer.UNK_ID, id)
def add_sep_token_id(self, id: int) -> None:
self.add_uint32(Keys.Tokenizer.SEP_ID, id)
def add_pad_token_id(self, id: int) -> None:
self.add_uint32(Keys.Tokenizer.PAD_ID, id)
def add_add_bos_token(self, value: bool) -> None:
self.add_bool(Keys.Tokenizer.ADD_BOS, value)
def add_add_eos_token(self, value: bool) -> None:
self.add_bool(Keys.Tokenizer.ADD_EOS, value)
def add_add_space_prefix(self, value: bool) -> None:
self.add_bool(Keys.Tokenizer.ADD_PREFIX, value)
def add_chat_template(self, value: str) -> None:
self.add_string(Keys.Tokenizer.CHAT_TEMPLATE, value)
def _pack(self, fmt: str, value: Any, skip_pack_prefix: bool = False) -> bytes:
pack_prefix = ''
if not skip_pack_prefix:
pack_prefix = '<' if self.endianess == GGUFEndian.LITTLE else '>'
return struct.pack(f'{pack_prefix}{fmt}', value)
def _write_packed(self, fmt: str, value: Any, skip_pack_prefix: bool = False) -> None:
self.fout.write(self._pack(fmt, value, skip_pack_prefix))

View File

@ -1,332 +0,0 @@
from __future__ import annotations
from typing import Sequence
from .constants import MODEL_ARCH, MODEL_TENSOR, MODEL_TENSORS, TENSOR_NAMES
class TensorNameMap:
mappings_cfg: dict[MODEL_TENSOR, tuple[str, ...]] = {
# Token embeddings
MODEL_TENSOR.TOKEN_EMBD: (
"gpt_neox.embed_in", # gptneox
"transformer.wte", # gpt2 gpt-j mpt refact qwen
"transformer.word_embeddings", # falcon
"word_embeddings", # bloom
"model.embed_tokens", # llama-hf
"tok_embeddings", # llama-pth
"embeddings.word_embeddings", # bert
"language_model.embedding.word_embeddings", # persimmon
"wte", # gpt2
"transformer.embd.wte", # phi2
"model.tok_embeddings", # internlm2
),
# Token type embeddings
MODEL_TENSOR.TOKEN_TYPES: (
"embeddings.token_type_embeddings", # bert
),
# Normalization of token embeddings
MODEL_TENSOR.TOKEN_EMBD_NORM: (
"word_embeddings_layernorm", # bloom
),
# Position embeddings
MODEL_TENSOR.POS_EMBD: (
"transformer.wpe", # gpt2
"embeddings.position_embeddings", # bert
"wpe", # gpt2
),
# Output
MODEL_TENSOR.OUTPUT: (
"embed_out", # gptneox
"lm_head", # gpt2 mpt falcon llama-hf baichuan qwen
"output", # llama-pth bloom internlm2
"word_embeddings_for_head", # persimmon
"lm_head.linear", # phi2
),
# Output norm
MODEL_TENSOR.OUTPUT_NORM: (
"gpt_neox.final_layer_norm", # gptneox
"transformer.ln_f", # gpt2 gpt-j falcon
"model.norm", # llama-hf baichuan internlm2
"norm", # llama-pth
"embeddings.LayerNorm", # bert
"transformer.norm_f", # mpt
"ln_f", # refact bloom qwen gpt2
"language_model.encoder.final_layernorm", # persimmon
"model.final_layernorm", # persimmon
"lm_head.ln", # phi2
),
# Rope frequencies
MODEL_TENSOR.ROPE_FREQS: (
"rope.freqs", # llama-pth
),
}
block_mappings_cfg: dict[MODEL_TENSOR, tuple[str, ...]] = {
# Attention norm
MODEL_TENSOR.ATTN_NORM: (
"gpt_neox.layers.{bid}.input_layernorm", # gptneox
"transformer.h.{bid}.ln_1", # gpt2 gpt-j refact qwen
"transformer.blocks.{bid}.norm_1", # mpt
"transformer.h.{bid}.input_layernorm", # falcon7b
"h.{bid}.input_layernorm", # bloom
"transformer.h.{bid}.ln_mlp", # falcon40b
"model.layers.{bid}.input_layernorm", # llama-hf
"layers.{bid}.attention_norm", # llama-pth
"encoder.layer.{bid}.attention.output.LayerNorm", # bert
"language_model.encoder.layers.{bid}.input_layernorm", # persimmon
"model.layers.{bid}.ln1", # yi
"h.{bid}.ln_1", # gpt2
"transformer.h.{bid}.ln", # phi2
"model.layers.layers.{bid}.norm", # plamo
"model.layers.{bid}.attention_norm", # internlm2
),
# Attention norm 2
MODEL_TENSOR.ATTN_NORM_2: (
"transformer.h.{bid}.ln_attn", # falcon40b
),
# Attention query-key-value
MODEL_TENSOR.ATTN_QKV: (
"gpt_neox.layers.{bid}.attention.query_key_value", # gptneox
"transformer.h.{bid}.attn.c_attn", # gpt2 qwen
"transformer.blocks.{bid}.attn.Wqkv", # mpt
"transformer.h.{bid}.self_attention.query_key_value", # falcon
"h.{bid}.self_attention.query_key_value", # bloom
"language_model.encoder.layers.{bid}.self_attention.query_key_value", # persimmon
"model.layers.{bid}.self_attn.query_key_value", # persimmon
"h.{bid}.attn.c_attn", # gpt2
"transformer.h.{bid}.mixer.Wqkv", # phi2
),
# Attention query
MODEL_TENSOR.ATTN_Q: (
"model.layers.{bid}.self_attn.q_proj", # llama-hf
"layers.{bid}.attention.wq", # llama-pth
"encoder.layer.{bid}.attention.self.query", # bert
"transformer.h.{bid}.attn.q_proj", # gpt-j
"model.layers.layers.{bid}.self_attn.q_proj", # plamo
"model.layers.{bid}.attention.wq" # internlm2
),
# Attention key
MODEL_TENSOR.ATTN_K: (
"model.layers.{bid}.self_attn.k_proj", # llama-hf
"layers.{bid}.attention.wk", # llama-pth
"encoder.layer.{bid}.attention.self.key", # bert
"transformer.h.{bid}.attn.k_proj", # gpt-j
"model.layers.layers.{bid}.self_attn.k_proj", # plamo
"model.layers.{bid}.attention.wk" # internlm2
),
# Attention value
MODEL_TENSOR.ATTN_V: (
"model.layers.{bid}.self_attn.v_proj", # llama-hf
"layers.{bid}.attention.wv", # llama-pth
"encoder.layer.{bid}.attention.self.value", # bert
"transformer.h.{bid}.attn.v_proj", # gpt-j
"model.layers.layers.{bid}.self_attn.v_proj", # plamo
"model.layers.{bid}.attention.wv" # internlm2
),
# Attention output
MODEL_TENSOR.ATTN_OUT: (
"gpt_neox.layers.{bid}.attention.dense", # gptneox
"transformer.h.{bid}.attn.c_proj", # gpt2 refact qwen
"transformer.blocks.{bid}.attn.out_proj", # mpt
"transformer.h.{bid}.self_attention.dense", # falcon
"h.{bid}.self_attention.dense", # bloom
"model.layers.{bid}.self_attn.o_proj", # llama-hf
"layers.{bid}.attention.wo", # llama-pth
"encoder.layer.{bid}.attention.output.dense", # bert
"transformer.h.{bid}.attn.out_proj", # gpt-j
"language_model.encoder.layers.{bid}.self_attention.dense", # persimmon
"model.layers.{bid}.self_attn.dense", # persimmon
"h.{bid}.attn.c_proj", # gpt2
"transformer.h.{bid}.mixer.out_proj", # phi2
"model.layers.layers.{bid}.self_attn.o_proj", # plamo
"model.layers.{bid}.attention.wo", # internlm2
),
# Rotary embeddings
MODEL_TENSOR.ATTN_ROT_EMBD: (
"model.layers.{bid}.self_attn.rotary_emb.inv_freq", # llama-hf
"layers.{bid}.attention.inner_attention.rope.freqs", # llama-pth
"model.layers.layers.{bid}.self_attn.rotary_emb.inv_freq", # plamo
"transformer.h.{bid}.attn.rotary_emb.inv_freq", # codeshell
),
# Feed-forward norm
MODEL_TENSOR.FFN_NORM: (
"gpt_neox.layers.{bid}.post_attention_layernorm", # gptneox
"transformer.h.{bid}.ln_2", # gpt2 refact qwen
"h.{bid}.post_attention_layernorm", # bloom
"transformer.blocks.{bid}.norm_2", # mpt
"model.layers.{bid}.post_attention_layernorm", # llama-hf
"layers.{bid}.ffn_norm", # llama-pth
"encoder.layer.{bid}.output.LayerNorm", # bert
"language_model.encoder.layers.{bid}.post_attention_layernorm", # persimmon
"model.layers.{bid}.ln2", # yi
"h.{bid}.ln_2", # gpt2
"model.layers.{bid}.ffn_norm", # internlm2
),
MODEL_TENSOR.FFN_GATE_INP: (
"layers.{bid}.feed_forward.gate", # mixtral
"model.layers.{bid}.block_sparse_moe.gate", # mixtral
),
# Feed-forward up
MODEL_TENSOR.FFN_UP: (
"gpt_neox.layers.{bid}.mlp.dense_h_to_4h", # gptneox
"transformer.h.{bid}.mlp.c_fc", # gpt2
"transformer.blocks.{bid}.ffn.up_proj", # mpt
"transformer.h.{bid}.mlp.dense_h_to_4h", # falcon
"h.{bid}.mlp.dense_h_to_4h", # bloom
"model.layers.{bid}.mlp.up_proj", # llama-hf refact
"layers.{bid}.feed_forward.w3", # llama-pth
"encoder.layer.{bid}.intermediate.dense", # bert
"transformer.h.{bid}.mlp.fc_in", # gpt-j
"language_model.encoder.layers.{bid}.mlp.dense_h_to_4h", # persimmon
"model.layers.{bid}.mlp.dense_h_to_4h", # persimmon
"transformer.h.{bid}.mlp.w1", # qwen
"h.{bid}.mlp.c_fc", # gpt2
"transformer.h.{bid}.mlp.fc1", # phi2
"model.layers.{bid}.mlp.fc1", # phi2
"model.layers.layers.{bid}.mlp.up_proj", # plamo
"model.layers.{bid}.feed_forward.w3", # internlm2
),
MODEL_TENSOR.FFN_UP_EXP: (
"layers.{bid}.feed_forward.experts.{xid}.w3", # mixtral
"model.layers.{bid}.block_sparse_moe.experts.{xid}.w3", # mixtral
),
# AWQ-activation gate
MODEL_TENSOR.FFN_ACT: (
"transformer.blocks.{bid}.ffn.act", # mpt
),
# Feed-forward gate
MODEL_TENSOR.FFN_GATE: (
"model.layers.{bid}.mlp.gate_proj", # llama-hf refact
"layers.{bid}.feed_forward.w1", # llama-pth
"transformer.h.{bid}.mlp.w2", # qwen
"model.layers.layers.{bid}.mlp.gate_proj", # plamo
"model.layers.{bid}.feed_forward.w1", # internlm2
),
MODEL_TENSOR.FFN_GATE_EXP: (
"layers.{bid}.feed_forward.experts.{xid}.w1", # mixtral
"model.layers.{bid}.block_sparse_moe.experts.{xid}.w1", # mixtral
),
# Feed-forward down
MODEL_TENSOR.FFN_DOWN: (
"gpt_neox.layers.{bid}.mlp.dense_4h_to_h", # gptneox
"transformer.h.{bid}.mlp.c_proj", # gpt2 refact qwen
"transformer.blocks.{bid}.ffn.down_proj", # mpt
"transformer.h.{bid}.mlp.dense_4h_to_h", # falcon
"h.{bid}.mlp.dense_4h_to_h", # bloom
"model.layers.{bid}.mlp.down_proj", # llama-hf
"layers.{bid}.feed_forward.w2", # llama-pth
"encoder.layer.{bid}.output.dense", # bert
"transformer.h.{bid}.mlp.fc_out", # gpt-j
"language_model.encoder.layers.{bid}.mlp.dense_4h_to_h", # persimmon
"model.layers.{bid}.mlp.dense_4h_to_h", # persimmon
"h.{bid}.mlp.c_proj", # gpt2
"transformer.h.{bid}.mlp.fc2", # phi2
"model.layers.{bid}.mlp.fc2", # phi2
"model.layers.layers.{bid}.mlp.down_proj", # plamo
"model.layers.{bid}.feed_forward.w2", # internlm2
),
MODEL_TENSOR.FFN_DOWN_EXP: (
"layers.{bid}.feed_forward.experts.{xid}.w2", # mixtral
"model.layers.{bid}.block_sparse_moe.experts.{xid}.w2", # mixtral
),
MODEL_TENSOR.ATTN_Q_NORM: (
"language_model.encoder.layers.{bid}.self_attention.q_layernorm",
"model.layers.{bid}.self_attn.q_layernorm", # persimmon
),
MODEL_TENSOR.ATTN_K_NORM: (
"language_model.encoder.layers.{bid}.self_attention.k_layernorm",
"model.layers.{bid}.self_attn.k_layernorm", # persimmon
),
MODEL_TENSOR.ROPE_FREQS: (
"language_model.encoder.layers.{bid}.self_attention.rotary_emb.inv_freq", # persimmon
),
}
mapping: dict[str, tuple[MODEL_TENSOR, str]]
def __init__(self, arch: MODEL_ARCH, n_blocks: int):
self.mapping = {}
for tensor, keys in self.mappings_cfg.items():
if tensor not in MODEL_TENSORS[arch]:
continue
tensor_name = TENSOR_NAMES[tensor]
self.mapping[tensor_name] = (tensor, tensor_name)
for key in keys:
self.mapping[key] = (tensor, tensor_name)
for bid in range(n_blocks):
for tensor, keys in self.block_mappings_cfg.items():
if tensor not in MODEL_TENSORS[arch]:
continue
# TODO: make this configurable
n_experts = 8
for xid in range(n_experts):
tensor_name = TENSOR_NAMES[tensor].format(bid = bid, xid = xid)
self.mapping[tensor_name] = (tensor, tensor_name)
for key in keys:
key = key.format(bid = bid, xid = xid)
self.mapping[key] = (tensor, tensor_name)
def get_type_and_name(self, key: str, try_suffixes: Sequence[str] = ()) -> tuple[MODEL_TENSOR, str] | None:
result = self.mapping.get(key)
if result is not None:
return result
for suffix in try_suffixes:
if key.endswith(suffix):
result = self.mapping.get(key[:-len(suffix)])
if result is not None:
return result[0], result[1] + suffix
return None
def get_name(self, key: str, try_suffixes: Sequence[str] = ()) -> str | None:
result = self.get_type_and_name(key, try_suffixes = try_suffixes)
if result is None:
return None
return result[1]
def get_type(self, key: str, try_suffixes: Sequence[str] = ()) -> MODEL_TENSOR | None:
result = self.get_type_and_name(key, try_suffixes = try_suffixes)
if result is None:
return None
return result[0]
def __getitem__(self, key: str) -> str:
try:
return self.mapping[key][1]
except KeyError:
raise KeyError(key)
def __contains__(self, key: str) -> bool:
return key in self.mapping
def __repr__(self) -> str:
return repr(self.mapping)
def get_tensor_name_map(arch: MODEL_ARCH, n_blocks: int) -> TensorNameMap:
return TensorNameMap(arch, n_blocks)

View File

@ -1,185 +0,0 @@
from __future__ import annotations
import json
import os
import sys
from pathlib import Path
from typing import Any, Callable
from .gguf_writer import GGUFWriter
class SpecialVocab:
merges: list[str]
add_special_token: dict[str, bool]
special_token_ids: dict[str, int]
chat_template: str | None
def __init__(
self, path: str | os.PathLike[str], load_merges: bool = False,
special_token_types: tuple[str, ...] | None = None,
n_vocab: int | None = None,
):
self.special_token_ids = {}
self.add_special_token = {}
self.n_vocab = n_vocab
self.load_merges = load_merges
self.merges = []
self.chat_template = None
if special_token_types is not None:
self.special_token_types = special_token_types
else:
self.special_token_types = ('bos', 'eos', 'unk', 'sep', 'pad')
self._load(Path(path))
def __repr__(self) -> str:
return '<SpecialVocab with {} merges, special tokens {}, add special tokens {}>'.format(
len(self.merges), self.special_token_ids or "unset", self.add_special_token or "unset",
)
def add_to_gguf(self, gw: GGUFWriter, quiet: bool = False) -> None:
if self.merges:
if not quiet:
print(f'gguf: Adding {len(self.merges)} merge(s).')
gw.add_token_merges(self.merges)
elif self.load_merges:
print(
'gguf: WARNING: Adding merges requested but no merges found, output may be non-functional.',
file = sys.stderr,
)
for typ, tokid in self.special_token_ids.items():
id_handler: Callable[[int], None] | None = getattr(gw, f'add_{typ}_token_id', None)
if id_handler is None:
print(
f'gguf: WARNING: No handler for special token type {typ} with id {tokid} - skipping',
file = sys.stderr,
)
continue
if not quiet:
print(f'gguf: Setting special token type {typ} to {tokid}')
id_handler(tokid)
for typ, value in self.add_special_token.items():
add_handler: Callable[[bool], None] | None = getattr(gw, f'add_add_{typ}_token', None)
if add_handler is None:
print(
f'gguf: WARNING: No handler for add_{typ}_token with value {value} - skipping',
file = sys.stderr,
)
continue
if not quiet:
print(f'gguf: Setting add_{typ}_token to {value}')
add_handler(value)
if self.chat_template is not None:
if not quiet:
print(f'gguf: Setting chat_template to {self.chat_template}')
gw.add_chat_template(self.chat_template)
def _load(self, path: Path) -> None:
self._try_load_from_tokenizer_json(path)
self._try_load_from_config_json(path)
if self.load_merges and not self.merges:
self._try_load_merges_txt(path)
def _try_load_merges_txt(self, path: Path) -> bool:
merges_file = path / 'merges.txt'
if not merges_file.is_file():
return False
with open(merges_file, 'r', encoding = 'utf-8') as fp:
first_line = next(fp, '').strip()
if not first_line.startswith('#'):
fp.seek(0)
line_num = 0
else:
line_num = 1
merges = []
for line in fp:
line_num += 1
line = line.strip()
if not line:
continue
parts = line.split(None, 3)
if len(parts) != 2:
print(
f'gguf: WARNING: {merges_file.name}: Line {line_num}: Entry malformed, ignoring',
file = sys.stderr,
)
continue
merges.append(f'{parts[0]} {parts[1]}')
self.merges = merges
return True
def _set_special_token(self, typ: str, tid: Any) -> None:
if not isinstance(tid, int):
return
if tid < 0:
raise ValueError(f'invalid value for special token type {typ}: {tid}')
if self.n_vocab is None or tid < self.n_vocab:
if typ in self.special_token_ids:
return
self.special_token_ids[typ] = tid
return
print(
f'gguf: WARNING: Special token type {typ}, id {tid} out of range, must be under {self.n_vocab} - skipping',
file = sys.stderr,
)
def _try_load_from_tokenizer_json(self, path: Path) -> bool:
tokenizer_file = path / 'tokenizer.json'
if tokenizer_file.is_file():
with open(tokenizer_file, encoding = 'utf-8') as f:
tokenizer = json.load(f)
if self.load_merges:
merges = tokenizer.get('model', {}).get('merges')
if isinstance(merges, list) and merges and isinstance(merges[0], str):
self.merges = merges
added_tokens = tokenizer.get('added_tokens', {})
else:
added_tokens = {}
tokenizer_config_file = path / 'tokenizer_config.json'
if not tokenizer_config_file.is_file():
return True
with open(tokenizer_config_file, encoding = 'utf-8') as f:
tokenizer_config = json.load(f)
chat_template = tokenizer_config.get('chat_template')
if chat_template is None or isinstance(chat_template, str):
self.chat_template = chat_template
else:
print(
f'gguf: WARNING: Bad type for chat_template field in {tokenizer_config_file!r} - ignoring',
file = sys.stderr
)
for typ in self.special_token_types:
add_entry = tokenizer_config.get(f'add_{typ}_token')
if isinstance(add_entry, bool):
self.add_special_token[typ] = add_entry
if not added_tokens:
# We will need this to get the content for the token, so if it's empty
# may as well just give up.
continue
entry = tokenizer_config.get(f'{typ}_token')
if isinstance(entry, str):
tc_content = entry
elif isinstance(entry, dict):
entry_content = entry.get('content')
if not isinstance(entry_content, str):
continue
tc_content = entry_content
else:
continue
# We only need the first match here.
maybe_token_id = next(
(atok.get('id') for atok in added_tokens if atok.get('content') == tc_content),
None,
)
self._set_special_token(typ, maybe_token_id)
return True
def _try_load_from_config_json(self, path: Path) -> bool:
config_file = path / 'config.json'
if not config_file.is_file():
return False
with open(config_file, encoding = 'utf-8') as f:
config = json.load(f)
for typ in self.special_token_types:
self._set_special_token(typ, config.get(f'{typ}_token_id'))
return True

View File

@ -1,35 +0,0 @@
[tool.poetry]
name = "gguf"
version = "0.7.0"
description = "Read and write ML models in GGUF for GGML"
authors = ["GGML <ggml@ggml.ai>"]
packages = [
{include = "gguf"},
{include = "gguf/py.typed"},
{include = "scripts"},
]
readme = "README.md"
homepage = "https://ggml.ai"
repository = "https://github.com/ggerganov/llama.cpp"
keywords = ["ggml", "gguf", "llama.cpp"]
classifiers = [
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
]
[tool.poetry.dependencies]
python = ">=3.8"
numpy = ">=1.17"
[tool.poetry.dev-dependencies]
pytest = "^5.2"
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
[tool.poetry.scripts]
gguf-convert-endian = "scripts:gguf_convert_endian_entrypoint"
gguf-dump = "scripts:gguf_dump_entrypoint"
gguf-set-metadata = "scripts:gguf_set_metadata_entrypoint"

View File

@ -1,12 +0,0 @@
import os
from importlib import import_module
os.environ["NO_LOCAL_GGUF"] = "TRUE"
gguf_convert_endian_entrypoint = import_module("scripts.gguf-convert-endian").main
gguf_dump_entrypoint = import_module("scripts.gguf-dump").main
gguf_set_metadata_entrypoint = import_module("scripts.gguf-set-metadata").main
del import_module, os

View File

@ -1,112 +0,0 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import os
import sys
from pathlib import Path
import numpy as np
# Necessary to load the local gguf package
if "NO_LOCAL_GGUF" not in os.environ and (Path(__file__).parent.parent.parent / 'gguf-py').exists():
sys.path.insert(0, str(Path(__file__).parent.parent))
import gguf
def convert_byteorder(reader: gguf.GGUFReader, args: argparse.Namespace) -> None:
if np.uint32(1) == np.uint32(1).newbyteorder("<"):
# Host is little endian
host_endian = "little"
swapped_endian = "big"
else:
# Sorry PDP or other weird systems that don't use BE or LE.
host_endian = "big"
swapped_endian = "little"
if reader.byte_order == "S":
file_endian = swapped_endian
else:
file_endian = host_endian
order = host_endian if args.order == "native" else args.order
print(f"* Host is {host_endian.upper()} endian, GGUF file seems to be {file_endian.upper()} endian")
if file_endian == order:
print(f"* File is already {order.upper()} endian. Nothing to do.")
sys.exit(0)
print("* Checking tensors for conversion compatibility")
for tensor in reader.tensors:
if tensor.tensor_type not in (
gguf.GGMLQuantizationType.F32,
gguf.GGMLQuantizationType.F16,
gguf.GGMLQuantizationType.Q8_0,
):
raise ValueError(f"Cannot handle type {tensor.tensor_type.name} for tensor {repr(tensor.name)}")
print(f"* Preparing to convert from {file_endian.upper()} to {order.upper()}")
if args.dry_run:
return
print("\n*** Warning *** Warning *** Warning **")
print("* This conversion process may damage the file. Ensure you have a backup.")
if order != host_endian:
print("* Requested endian differs from host, you will not be able to load the model on this machine.")
print("* The file will be modified immediately, so if conversion fails or is interrupted")
print("* the file will be corrupted. Enter exactly YES if you are positive you want to proceed:")
response = input("YES, I am sure> ")
if response != "YES":
print("You didn't enter YES. Okay then, see ya!")
sys.exit(0)
print(f"\n* Converting fields ({len(reader.fields)})")
for idx, field in enumerate(reader.fields.values()):
print(f"- {idx:4}: Converting field {repr(field.name)}, part count: {len(field.parts)}")
for part in field.parts:
part.byteswap(inplace=True)
print(f"\n* Converting tensors ({len(reader.tensors)})")
for idx, tensor in enumerate(reader.tensors):
print(
f" - {idx:4}: Converting tensor {repr(tensor.name)}, type={tensor.tensor_type.name}, "
f"elements={tensor.n_elements}... ",
end="",
)
tensor_type = tensor.tensor_type
for part in tensor.field.parts:
part.byteswap(inplace=True)
if tensor_type != gguf.GGMLQuantizationType.Q8_0:
tensor.data.byteswap(inplace=True)
print()
continue
# A Q8_0 block consists of a f16 delta followed by 32 int8 quants, so 34 bytes
block_size = 34
n_blocks = len(tensor.data) // block_size
for block_num in range(n_blocks):
block_offs = block_num * block_size
# I know I said f16, but it doesn't matter here - any simple 16 bit type works.
delta = tensor.data[block_offs:block_offs + 2].view(dtype=np.uint16)
delta.byteswap(inplace=True)
if block_num % 100000 == 0:
print(f"[{(n_blocks - block_num) // 1000}K]", end="")
sys.stdout.flush()
print()
print("* Completion")
def main() -> None:
parser = argparse.ArgumentParser(description="Convert GGUF file byte order")
parser.add_argument(
"model", type=str,
help="GGUF format model filename",
)
parser.add_argument(
"order", type=str, choices=['big', 'little', 'native'],
help="Requested byte order",
)
parser.add_argument(
"--dry-run", action="store_true",
help="Don't actually change anything",
)
args = parser.parse_args(None if len(sys.argv) > 1 else ["--help"])
print(f'* Loading: {args.model}')
reader = gguf.GGUFReader(args.model, 'r' if args.dry_run else 'r+')
convert_byteorder(reader, args)
if __name__ == "__main__":
main()

View File

@ -1,117 +0,0 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import os
import sys
from pathlib import Path
from typing import Any
import numpy as np
# Necessary to load the local gguf package
if "NO_LOCAL_GGUF" not in os.environ and (Path(__file__).parent.parent.parent / 'gguf-py').exists():
sys.path.insert(0, str(Path(__file__).parent.parent))
from gguf import GGUFReader, GGUFValueType # noqa: E402
def get_file_host_endian(reader: GGUFReader) -> tuple[str, str]:
host_endian = 'LITTLE' if np.uint32(1) == np.uint32(1).newbyteorder("<") else 'BIG'
if reader.byte_order == 'S':
file_endian = 'BIG' if host_endian == 'LITTLE' else 'LITTLE'
else:
file_endian = host_endian
return (host_endian, file_endian)
# For more information about what field.parts and field.data represent,
# please see the comments in the modify_gguf.py example.
def dump_metadata(reader: GGUFReader, args: argparse.Namespace) -> None:
host_endian, file_endian = get_file_host_endian(reader)
print(f'* File is {file_endian} endian, script is running on a {host_endian} endian host.')
print(f'\n* Dumping {len(reader.fields)} key/value pair(s)')
for n, field in enumerate(reader.fields.values(), 1):
if not field.types:
pretty_type = 'N/A'
elif field.types[0] == GGUFValueType.ARRAY:
nest_count = len(field.types) - 1
pretty_type = '[' * nest_count + str(field.types[-1].name) + ']' * nest_count
else:
pretty_type = str(field.types[-1].name)
print(f' {n:5}: {pretty_type:10} | {len(field.data):8} | {field.name}', end = '')
if len(field.types) == 1:
curr_type = field.types[0]
if curr_type == GGUFValueType.STRING:
print(' = {0}'.format(repr(str(bytes(field.parts[-1]), encoding='utf8')[:60])), end = '')
elif field.types[0] in reader.gguf_scalar_to_np:
print(' = {0}'.format(field.parts[-1][0]), end = '')
print()
if args.no_tensors:
return
print(f'\n* Dumping {len(reader.tensors)} tensor(s)')
for n, tensor in enumerate(reader.tensors, 1):
prettydims = ', '.join('{0:5}'.format(d) for d in list(tensor.shape) + [1] * (4 - len(tensor.shape)))
print(f' {n:5}: {tensor.n_elements:10} | {prettydims} | {tensor.tensor_type.name:7} | {tensor.name}')
def dump_metadata_json(reader: GGUFReader, args: argparse.Namespace) -> None:
import json
host_endian, file_endian = get_file_host_endian(reader)
metadata: dict[str, Any] = {}
tensors: dict[str, Any] = {}
result = {
"filename": args.model,
"endian": file_endian,
"metadata": metadata,
"tensors": tensors,
}
for idx, field in enumerate(reader.fields.values()):
curr: dict[str, Any] = {
"index": idx,
"type": field.types[0].name if field.types else 'UNKNOWN',
"offset": field.offset,
}
metadata[field.name] = curr
if field.types[:1] == [GGUFValueType.ARRAY]:
curr["array_types"] = [t.name for t in field.types][1:]
if not args.json_array:
continue
itype = field.types[-1]
if itype == GGUFValueType.STRING:
curr["value"] = [str(bytes(field.parts[idx]), encoding="utf-8") for idx in field.data]
else:
curr["value"] = [pv for idx in field.data for pv in field.parts[idx].tolist()]
elif field.types[0] == GGUFValueType.STRING:
curr["value"] = str(bytes(field.parts[-1]), encoding="utf-8")
else:
curr["value"] = field.parts[-1].tolist()[0]
if not args.no_tensors:
for idx, tensor in enumerate(reader.tensors):
tensors[tensor.name] = {
"index": idx,
"shape": tensor.shape.tolist(),
"type": tensor.tensor_type.name,
"offset": tensor.field.offset,
}
json.dump(result, sys.stdout)
def main() -> None:
parser = argparse.ArgumentParser(description="Dump GGUF file metadata")
parser.add_argument("model", type=str, help="GGUF format model filename")
parser.add_argument("--no-tensors", action="store_true", help="Don't dump tensor metadata")
parser.add_argument("--json", action="store_true", help="Produce JSON output")
parser.add_argument("--json-array", action="store_true", help="Include full array values in JSON output (long)")
args = parser.parse_args(None if len(sys.argv) > 1 else ["--help"])
if not args.json:
print(f'* Loading: {args.model}')
reader = GGUFReader(args.model, 'r')
if args.json:
dump_metadata_json(reader, args)
else:
dump_metadata(reader, args)
if __name__ == '__main__':
main()

View File

@ -1,90 +0,0 @@
#!/usr/bin/env python3
import argparse
import os
import sys
from pathlib import Path
# Necessary to load the local gguf package
if "NO_LOCAL_GGUF" not in os.environ and (Path(__file__).parent.parent.parent / 'gguf-py').exists():
sys.path.insert(0, str(Path(__file__).parent.parent))
from gguf import GGUFReader # noqa: E402
def minimal_example(filename: str) -> None:
reader = GGUFReader(filename, 'r+')
field = reader.fields['tokenizer.ggml.bos_token_id']
if field is None:
return
part_index = field.data[0]
field.parts[part_index][0] = 2 # Set tokenizer.ggml.bos_token_id to 2
#
# So what's this field.data thing? It's helpful because field.parts contains
# _every_ part of the GGUF field. For example, tokenizer.ggml.bos_token_id consists
# of:
#
# Part index 0: Key length (27)
# Part index 1: Key data ("tokenizer.ggml.bos_token_id")
# Part index 2: Field type (4, the id for GGUFValueType.UINT32)
# Part index 3: Field value
#
# Note also that each part is an NDArray slice, so even a part that
# is only a single value like the key length will be a NDArray of
# the key length type (numpy.uint32).
#
# The .data attribute in the Field is a list of relevant part indexes
# and doesn't contain internal GGUF details like the key length part.
# In this case, .data will be [3] - just the part index of the
# field value itself.
def set_metadata(reader: GGUFReader, args: argparse.Namespace) -> None:
field = reader.get_field(args.key)
if field is None:
print(f'! Field {repr(args.key)} not found', file = sys.stderr)
sys.exit(1)
# Note that field.types is a list of types. This is because the GGUF
# format supports arrays. For example, an array of UINT32 would
# look like [GGUFValueType.ARRAY, GGUFValueType.UINT32]
handler = reader.gguf_scalar_to_np.get(field.types[0]) if field.types else None
if handler is None:
print(
f'! This tool only supports changing simple values, {repr(args.key)} has unsupported type {field.types}',
file = sys.stderr,
)
sys.exit(1)
current_value = field.parts[field.data[0]][0]
new_value = handler(args.value)
print(f'* Preparing to change field {repr(args.key)} from {current_value} to {new_value}')
if current_value == new_value:
print(f'- Key {repr(args.key)} already set to requested value {current_value}')
sys.exit(0)
if args.dry_run:
sys.exit(0)
if not args.force:
print('*** Warning *** Warning *** Warning **')
print('* Changing fields in a GGUF file can make it unusable. Proceed at your own risk.')
print('* Enter exactly YES if you are positive you want to proceed:')
response = input('YES, I am sure> ')
if response != 'YES':
print("You didn't enter YES. Okay then, see ya!")
sys.exit(0)
field.parts[field.data[0]][0] = new_value
print('* Field changed. Successful completion.')
def main() -> None:
parser = argparse.ArgumentParser(description="Set a simple value in GGUF file metadata")
parser.add_argument("model", type=str, help="GGUF format model filename")
parser.add_argument("key", type=str, help="Metadata key to set")
parser.add_argument("value", type=str, help="Metadata value to set")
parser.add_argument("--dry-run", action="store_true", help="Don't actually change anything")
parser.add_argument("--force", action="store_true", help="Change the field without confirmation")
args = parser.parse_args(None if len(sys.argv) > 1 else ["--help"])
print(f'* Loading: {args.model}')
reader = GGUFReader(args.model, 'r' if args.dry_run else 'r+')
set_metadata(reader, args)
if __name__ == '__main__':
main()

View File

@ -1,7 +0,0 @@
import gguf # noqa: F401
# TODO: add tests
def test_write_gguf() -> None:
pass

View File

@ -1,14 +0,0 @@
import subprocess
import sys
deps = [
'numpy~=1.24.4',
'sentencepiece~=0.1.98',
'transformers>=4.35.2,<5.0.0',
'gguf>=0.1.0',
'protobuf>=4.21.0,<5.0.0',
'torch~=2.1.1',
'packaging>=20.0',
'tiktoken~=0.5.0'
]
subprocess.check_call([sys.executable, '-m', 'pip', 'install', '--upgrade', '--force-reinstall', *deps])

View File

@ -1 +0,0 @@
b2106

View File

@ -19,8 +19,6 @@ import {
DownloadRequest,
executeOnMain,
HuggingFaceRepoData,
Quantization,
log,
getFileSize,
AllQuantizations,
ModelEvent,
@ -353,7 +351,7 @@ export default class JanModelExtension extends ModelExtension {
}
/**
* Saves a machine learning model.
* Saves a model file.
* @param model - The model to save.
* @returns A Promise that resolves when the model is saved.
*/
@ -565,6 +563,19 @@ export default class JanModelExtension extends ModelExtension {
}
const defaultModel = (await this.getDefaultModel()) as Model
const metadata = await executeOnMain(
NODE,
'retrieveGGUFMetadata',
await joinPath([
await getJanDataFolderPath(),
'models',
dirName,
binaryFileName,
])
)
const eos_id = metadata?.['tokenizer.ggml.eos_token_id']
if (!defaultModel) {
console.error('Unable to find default model')
return
@ -581,8 +592,20 @@ export default class JanModelExtension extends ModelExtension {
filename: binaryFileName,
},
],
parameters: {
...defaultModel.parameters,
stop: eos_id
? [metadata['tokenizer.ggml.tokens'][eos_id] ?? '']
: defaultModel.parameters.stop,
},
settings: {
...defaultModel.settings,
prompt_template:
metadata?.parsed_chat_template ??
defaultModel.settings.prompt_template,
ctx_len:
metadata?.['llama.context_length'] ?? defaultModel.settings.ctx_len,
ngl: (metadata?.['llama.block_count'] ?? 32) + 1,
llama_model_path: binaryFileName,
},
created: Date.now(),
@ -657,6 +680,13 @@ export default class JanModelExtension extends ModelExtension {
return
}
const metadata = await executeOnMain(
NODE,
'retrieveGGUFMetadata',
modelBinaryPath
)
const eos_id = metadata?.['tokenizer.ggml.eos_token_id']
const binaryFileName = await baseName(modelBinaryPath)
const model: Model = {
@ -669,8 +699,21 @@ export default class JanModelExtension extends ModelExtension {
filename: binaryFileName,
},
],
parameters: {
...defaultModel.parameters,
stop: eos_id
? [metadata?.['tokenizer.ggml.tokens'][eos_id] ?? '']
: defaultModel.parameters.stop,
},
settings: {
...defaultModel.settings,
prompt_template:
metadata?.parsed_chat_template ??
defaultModel.settings.prompt_template,
ctx_len:
metadata?.['llama.context_length'] ?? defaultModel.settings.ctx_len,
ngl: (metadata?.['llama.block_count'] ?? 32) + 1,
llama_model_path: binaryFileName,
},
created: Date.now(),
@ -710,9 +753,17 @@ export default class JanModelExtension extends ModelExtension {
const updatedModel: Model = {
...model,
...modelInfo,
parameters: {
...model.parameters,
...modelInfo.parameters,
},
settings: {
...model.settings,
...modelInfo.settings,
},
metadata: {
...model.metadata,
tags: modelInfo.metadata?.tags ?? [],
...modelInfo.metadata,
},
}
@ -826,218 +877,4 @@ export default class JanModelExtension extends ModelExtension {
importedModels
)
}
private getGgufFileList(
repoData: HuggingFaceRepoData,
selectedQuantization: Quantization
): string[] {
return repoData.siblings
.map((file) => file.rfilename)
.filter((file) => file.indexOf(selectedQuantization) !== -1)
.filter((file) => file.endsWith('.gguf'))
}
private getFileList(repoData: HuggingFaceRepoData): string[] {
// SafeTensors first, if not, then PyTorch
const modelFiles = repoData.siblings
.map((file) => file.rfilename)
.filter((file) =>
JanModelExtension._safetensorsRegexs.some((regex) => regex.test(file))
)
if (modelFiles.length === 0) {
repoData.siblings.forEach((file) => {
if (
JanModelExtension._pytorchRegexs.some((regex) =>
regex.test(file.rfilename)
)
) {
modelFiles.push(file.rfilename)
}
})
}
const vocabFiles = [
'tokenizer.model',
'vocab.json',
'tokenizer.json',
].filter((file) =>
repoData.siblings.some((sibling) => sibling.rfilename === file)
)
const etcFiles = repoData.siblings
.map((file) => file.rfilename)
.filter(
(file) =>
(file.endsWith('.json') && !vocabFiles.includes(file)) ||
file.endsWith('.txt') ||
file.endsWith('.py') ||
file.endsWith('.tiktoken')
)
return [...modelFiles, ...vocabFiles, ...etcFiles]
}
private async getModelDirPath(repoID: string): Promise<string> {
const modelName = repoID.split('/').slice(1).join('/')
return joinPath([await getJanDataFolderPath(), 'models', modelName])
}
private async getConvertedModelPath(repoID: string): Promise<string> {
const modelName = repoID.split('/').slice(1).join('/')
const modelDirPath = await this.getModelDirPath(repoID)
return joinPath([modelDirPath, modelName + '.gguf'])
}
private async getQuantizedModelPath(
repoID: string,
quantization: Quantization
): Promise<string> {
const modelName = repoID.split('/').slice(1).join('/')
const modelDirPath = await this.getModelDirPath(repoID)
return joinPath([
modelDirPath,
modelName + `-${quantization.toLowerCase()}.gguf`,
])
}
private getCtxLength(config: {
max_sequence_length?: number
max_position_embeddings?: number
n_ctx?: number
}): number {
if (config.max_sequence_length) return config.max_sequence_length
if (config.max_position_embeddings) return config.max_position_embeddings
if (config.n_ctx) return config.n_ctx
return 2048
}
/**
* Converts a Hugging Face model to GGUF.
* @param repoID - The repo ID of the model to convert.
* @returns A promise that resolves when the conversion is complete.
*/
async convert(repoID: string): Promise<void> {
if (this.interrupted) return
const modelDirPath = await this.getModelDirPath(repoID)
const modelOutPath = await this.getConvertedModelPath(repoID)
if (!(await fs.existsSync(modelDirPath))) {
throw new Error('Model dir not found')
}
if (await fs.existsSync(modelOutPath)) return
await executeOnMain(NODE, 'installDeps')
if (this.interrupted) return
try {
await executeOnMain(
NODE,
'convertHf',
modelDirPath,
modelOutPath + '.temp'
)
} catch (err) {
log(`[Conversion]::Debug: Error using hf-to-gguf.py, trying convert.py`)
let ctx = 2048
try {
const config = await fs.readFileSync(
await joinPath([modelDirPath, 'config.json']),
'utf8'
)
const configParsed = JSON.parse(config)
ctx = this.getCtxLength(configParsed)
configParsed.max_sequence_length = ctx
await fs.writeFileSync(
await joinPath([modelDirPath, 'config.json']),
JSON.stringify(configParsed, null, 2)
)
} catch (err) {
log(`${err}`)
// ignore missing config.json
}
const bpe = await fs.existsSync(
await joinPath([modelDirPath, 'vocab.json'])
)
await executeOnMain(
NODE,
'convert',
modelDirPath,
modelOutPath + '.temp',
{
ctx,
bpe,
}
)
}
await executeOnMain(
NODE,
'renameSync',
modelOutPath + '.temp',
modelOutPath
)
for (const file of await fs.readdirSync(modelDirPath)) {
if (
modelOutPath.endsWith(file) ||
(file.endsWith('config.json') && !file.endsWith('_config.json'))
)
continue
await fs.unlinkSync(await joinPath([modelDirPath, file]))
}
}
/**
* Quantizes a GGUF model.
* @param repoID - The repo ID of the model to quantize.
* @param quantization - The quantization to use.
* @returns A promise that resolves when the quantization is complete.
*/
async quantize(repoID: string, quantization: Quantization): Promise<void> {
if (this.interrupted) return
const modelDirPath = await this.getModelDirPath(repoID)
const modelOutPath = await this.getQuantizedModelPath(repoID, quantization)
if (!(await fs.existsSync(modelDirPath))) {
throw new Error('Model dir not found')
}
if (await fs.existsSync(modelOutPath)) return
await executeOnMain(
NODE,
'quantize',
await this.getConvertedModelPath(repoID),
modelOutPath + '.temp',
quantization
)
await executeOnMain(
NODE,
'renameSync',
modelOutPath + '.temp',
modelOutPath
)
await fs.unlinkSync(await this.getConvertedModelPath(repoID))
}
/**
* Cancels the convert of current Hugging Face model.
* @param repoID - The repository ID to cancel.
* @param repoData - The repository data to cancel.
* @returns {Promise<void>} A promise that resolves when the download has been cancelled.
*/
async cancelConvert(
repoID: string,
repoData: HuggingFaceRepoData
): Promise<void> {
this.interrupted = true
const modelDirPath = await this.getModelDirPath(repoID)
const files = this.getFileList(repoData)
for (const file of files) {
const filePath = file
const localPath = await joinPath([modelDirPath, filePath])
await abortDownload(localPath)
}
executeOnMain(NODE, 'killProcesses')
}
}

View File

@ -1,182 +1,47 @@
import { PythonShell } from 'python-shell'
import { spawn, ChildProcess } from 'child_process'
import { resolve as presolve, join as pjoin } from 'path'
import { log, Quantization } from '@janhq/core/node'
import { statSync } from 'fs'
export { renameSync } from 'fs'
import { closeSync, openSync, readSync } from 'fs'
import { Template } from '@huggingface/jinja'
/**
* This is to retrieve the metadata from a GGUF file
* It uses hyllama and jinja from @huggingface module
*/
export const retrieveGGUFMetadata = async (ggufPath: string) => {
try {
const { ggufMetadata } = await import('hyllama')
// Read first 10mb of gguf file
const fd = openSync(ggufPath, 'r')
const buffer = new Uint8Array(10_000_000)
readSync(fd, buffer, 0, 10_000_000, 0)
closeSync(fd)
let pythonShell: PythonShell | undefined = undefined
let quantizeProcess: ChildProcess | undefined = undefined
// Parse metadata and tensor info
const { metadata } = ggufMetadata(buffer.buffer)
export const getSize = (path: string): number => statSync(path).size
export const killProcesses = () => {
if (pythonShell) {
pythonShell.kill()
pythonShell = undefined
}
if (quantizeProcess) {
quantizeProcess.kill()
quantizeProcess = undefined
const template = new Template(metadata['tokenizer.chat_template'])
const eos_id = metadata['tokenizer.ggml.eos_token_id']
const bos_id = metadata['tokenizer.ggml.bos_token_id']
const eos_token = metadata['tokenizer.ggml.tokens'][eos_id]
const bos_token = metadata['tokenizer.ggml.tokens'][bos_id]
// Parse jinja template
const renderedTemplate = template.render({
add_generation_prompt: true,
eos_token,
bos_token,
messages: [
{
role: 'system',
content: '{system_message}',
},
{
role: 'user',
content: '{prompt}',
},
],
})
return {
...metadata,
parsed_chat_template: renderedTemplate,
}
} catch (e) {
console.log('[MODEL_EXT]', e)
}
}
export const getQuantizeExecutable = (): string => {
let binaryFolder = pjoin(__dirname, '..', 'bin') // Current directory by default
let binaryName = 'quantize'
/**
* The binary folder is different for each platform.
*/
if (process.platform === 'win32') {
binaryFolder = pjoin(binaryFolder, 'win')
binaryName = 'quantize.exe'
} else if (process.platform === 'darwin') {
/**
* For MacOS: mac-universal both Silicon and InteL
*/
binaryFolder = pjoin(binaryFolder, 'mac-universal')
} else {
binaryFolder = pjoin(binaryFolder, 'linux-cpu')
}
return pjoin(binaryFolder, binaryName)
}
export const installDeps = (): Promise<void> => {
return new Promise((resolve, reject) => {
const _pythonShell = new PythonShell(
presolve(__dirname, '..', 'scripts', 'install_deps.py')
)
_pythonShell.on('message', (message) => {
log(`[Install Deps]::Debug: ${message}`)
})
_pythonShell.on('stderr', (stderr) => {
log(`[Install Deps]::Error: ${stderr}`)
})
_pythonShell.on('error', (err) => {
pythonShell = undefined
log(`[Install Deps]::Error: ${err}`)
reject(err)
})
_pythonShell.on('close', () => {
const exitCode = _pythonShell.exitCode
pythonShell = undefined
log(
`[Install Deps]::Debug: Deps installation exited with code: ${exitCode}`
)
exitCode === 0 ? resolve() : reject(exitCode)
})
})
}
export const convertHf = async (
modelDirPath: string,
outPath: string
): Promise<void> => {
return await new Promise<void>((resolve, reject) => {
const _pythonShell = new PythonShell(
presolve(__dirname, '..', 'scripts', 'convert-hf-to-gguf.py'),
{
args: [modelDirPath, '--outfile', outPath],
}
)
pythonShell = _pythonShell
_pythonShell.on('message', (message) => {
log(`[Conversion]::Debug: ${message}`)
})
_pythonShell.on('stderr', (stderr) => {
log(`[Conversion]::Error: ${stderr}`)
})
_pythonShell.on('error', (err) => {
pythonShell = undefined
log(`[Conversion]::Error: ${err}`)
reject(err)
})
_pythonShell.on('close', () => {
const exitCode = _pythonShell.exitCode
pythonShell = undefined
if (exitCode !== 0) {
log(`[Conversion]::Debug: Conversion exited with code: ${exitCode}`)
reject(exitCode)
} else {
resolve()
}
})
})
}
export const convert = async (
modelDirPath: string,
outPath: string,
{ ctx, bpe }: { ctx?: number; bpe?: boolean }
): Promise<void> => {
const args = [modelDirPath, '--outfile', outPath]
if (ctx) {
args.push('--ctx')
args.push(ctx.toString())
}
if (bpe) {
args.push('--vocab-type')
args.push('bpe')
}
return await new Promise<void>((resolve, reject) => {
const _pythonShell = new PythonShell(
presolve(__dirname, '..', 'scripts', 'convert.py'),
{
args,
}
)
_pythonShell.on('message', (message) => {
log(`[Conversion]::Debug: ${message}`)
})
_pythonShell.on('stderr', (stderr) => {
log(`[Conversion]::Error: ${stderr}`)
})
_pythonShell.on('error', (err) => {
pythonShell = undefined
log(`[Conversion]::Error: ${err}`)
reject(err)
})
_pythonShell.on('close', () => {
const exitCode = _pythonShell.exitCode
pythonShell = undefined
if (exitCode !== 0) {
log(`[Conversion]::Debug: Conversion exited with code: ${exitCode}`)
reject(exitCode)
} else {
resolve()
}
})
})
}
export const quantize = async (
modelPath: string,
outPath: string,
quantization: Quantization
): Promise<void> => {
return await new Promise<void>((resolve, reject) => {
const quantizeExecutable = getQuantizeExecutable()
const _quantizeProcess = spawn(quantizeExecutable, [
modelPath,
outPath,
quantization,
])
quantizeProcess = _quantizeProcess
_quantizeProcess.stdout?.on('data', (data) => {
log(`[Quantization]::Debug: ${data}`)
})
_quantizeProcess.stderr?.on('data', (data) => {
log(`[Quantization]::Error: ${data}`)
})
_quantizeProcess.on('close', (code) => {
if (code !== 0) {
log(`[Quantization]::Debug: Quantization exited with code: ${code}`)
reject(code)
} else {
resolve()
}
})
})
}

View File

@ -1,8 +1,8 @@
[
{
"key": "log-enabled",
"title": "App Logging Enabled",
"description": "We recommend enabling this setting to help us improve the app. Your data will be kept private on your computer, and you can opt out at any time.",
"title": "Enable App Logs",
"description": "Saves app logs locally on your computer. This enables you to send us crash reports.",
"controllerType": "checkbox",
"controllerProps": {
"value": true
@ -11,7 +11,7 @@
{
"key": "log-cleaning-interval",
"title": "Log Cleaning Interval",
"description": "Log cleaning interval in milliseconds.",
"description": "Automatically delete local logs after a certain time interval (in milliseconds).",
"controllerType": "input",
"controllerProps": {
"value": "120000",
@ -19,4 +19,4 @@
"textAlign": "right"
}
}
]
]

View File

@ -2,17 +2,30 @@ import React, { ReactNode, forwardRef } from 'react'
import { twMerge } from 'tailwind-merge'
import './styles.scss'
import { Cross2Icon } from '@radix-ui/react-icons'
export interface Props extends React.InputHTMLAttributes<HTMLInputElement> {
textAlign?: 'left' | 'right'
prefixIcon?: ReactNode
suffixIcon?: ReactNode
onCLick?: () => void
clearable?: boolean
onClear?: () => void
}
const Input = forwardRef<HTMLInputElement, Props>(
(
{ className, type, textAlign, prefixIcon, suffixIcon, onClick, ...props },
{
className,
type,
textAlign,
prefixIcon,
suffixIcon,
onClick,
onClear,
clearable,
...props
},
ref
) => {
return (
@ -27,6 +40,11 @@ const Input = forwardRef<HTMLInputElement, Props>(
{suffixIcon}
</div>
)}
{clearable && (
<div className="input__clear-icon" onClick={onClear}>
<Cross2Icon className="text-red-200" />
</div>
)}
<input
type={type}
className={twMerge(

View File

@ -40,4 +40,11 @@
padding-right: 32px;
}
}
&__clear-icon {
@apply absolute right-3 top-1/2 -translate-y-1/2 cursor-pointer;
color: hsla(var(--input-icon));
+ .input {
padding: 0 32px;
}
}
}

View File

@ -33,13 +33,16 @@ const Modal = ({
<DialogPrimitive.Portal>
<DialogPrimitive.Overlay className="modal__overlay" />
<DialogPrimitive.Content
aria-describedby={undefined}
className={twMerge(
'modal__content',
fullPage && 'modal__content--fullpage',
className
)}
>
<div className="modal__title">{title}</div>
<DialogPrimitive.Title className="modal__title">
{title}
</DialogPrimitive.Title>
{content}
{!hideClose && (
<ModalClose asChild>

View File

@ -42,7 +42,7 @@ fieldset,
}
&__title {
@apply line-clamp-1;
@apply leading-relaxed;
margin: 0 0 8px 0;
padding-right: 16px;
font-weight: 600;

View File

@ -9,7 +9,7 @@ const ScrollArea = React.forwardRef<
React.ComponentPropsWithoutRef<typeof ScrollAreaPrimitive.Root>
>(({ className, children, onScroll, ...props }, ref) => (
<ScrollAreaPrimitive.Root
type="scroll"
type="auto"
className={twMerge('scroll-area__root', className)}
{...props}
>

View File

@ -53,8 +53,8 @@
}
::-webkit-scrollbar {
width: 6px;
height: 6px;
width: 8px;
height: 8px;
}
::-webkit-scrollbar-track,
::-webkit-scrollbar-thumb {

View File

@ -10,7 +10,7 @@
animation-timing-function: cubic-bezier(0.16, 1, 0.3, 1);
will-change: transform, opacity;
font-weight: 500;
z-index: 100;
z-index: 999999999;
max-width: 240px;
@apply text-sm leading-normal;
}

View File

@ -41,14 +41,16 @@
"build": "yarn build:web && yarn build:electron",
"build:publish": "yarn copy:assets && yarn build:web && yarn workspace jan build:publish",
"dev:joi": "yarn workspace @janhq/joi install && yarn workspace @janhq/joi dev",
"build:joi": "yarn workspace @janhq/joi install && yarn workspace @janhq/joi build"
"build:joi": "yarn workspace @janhq/joi install && yarn workspace @janhq/joi build",
"prepare": "husky"
},
"devDependencies": {
"concurrently": "^8.2.1",
"cpx": "^1.5.0",
"husky": "^9.1.5",
"rimraf": "^3.0.2",
"wait-on": "^7.0.1",
"run-script-os": "^1.1.6"
"run-script-os": "^1.1.6",
"wait-on": "^7.0.1"
},
"version": "0.0.0"
}

View File

@ -1,14 +0,0 @@
spec:
@echo "Initiating a Spec..."
@last_number=$$(ls $(CURDIR)/jan-[0-9][0-9][0-9]-* | sort -V | tail -n 1 | cut -d '-' -f 2); \
last_number=$$(echo $$last_number | sed 's/^0*//'); \
next_number=$$(printf "%03d" $$(( $$last_number + 1 ))); \
read -p "Enter Spec title: " title; \
title=$$(echo $$title | tr ' ' '-'); \
cp $(CURDIR)/spec-template.md $(CURDIR)/jan-$$next_number-$$title.md; \
date=$$(date +%Y-%m-%d); \
usernames=$$(git config user.name); \
sed -i '' 's/{SPEC-NUM}/'$$next_number'/g' $(CURDIR)/jan-$$next_number-$$title.md; \
sed -i '' 's/{TITLE}/'$$title'/g' $(CURDIR)/jan-$$next_number-$$title.md; \
sed -i '' 's/{DATE}/'$$date'/g' $(CURDIR)/jan-$$next_number-$$title.md; \
sed -i '' 's/{USERNAMES}/'$$usernames'/g' $(CURDIR)/jan-$$next_number-$$title.md

188
specs/QA-checklist.md Normal file
View File

@ -0,0 +1,188 @@
# Regression test
**Release Version:** v0.6.0
**Operating System:**
---
## A. Installation, Update, and Uninstallation
### 1. Users install app (New user flow)
- [ ] :rocket: Installation package is not corrupted and passes all security checks.
- [ ] :key: App launches successfully after installation.
### 2. Users update app (Existing user flow)
- [ ] :key: Validate that the update does not corrupt user data or settings.
- [ ] :key: App restarts or prompts the user to restart after an update.
- [ ] When updating the app, check if the `/models` directory has any JSON/YML files that change according to the update.
- [ ] Updating the app also updates extensions correctly, test functionality changes.
### 3. Users uninstall / close app
- [ ] :key: After closing the app, all models are unloaded.
- [ ] :key::warning: Uninstallation process removes the app successfully from the system.
- [ ] Clean the data folder and open the app to check if it creates all the necessary folders, especially models and extensions.
## B. Overview
### 1. Shortcut key
- [ ] :key: Test each shortcut key to confirm it works as described (My models, navigating, opening, closing, etc.).
### 2. Users check the `active model`
- [ ] :key: The app correctly displays the state of the loading model (e.g., loading, ready, error).
- [ ] :key: Confirm that the app allows users to switch between models if multiple are available.
- [ ] Check that the app provides feedback or instructions if the model fails to load.
- [ ] Verify the troubleshooting assistant correctly capture hardware / log info [#1784](https://github.com/janhq/jan/issues/1784)
## C. Thread
### 1. Users can chat with Jan, the default assistant
- [ ] :key: Sending a message enables users to receive responses from model.
- [ ] :key: Conversation thread is maintained without any loss of data upon sending multiple messages.
- [ ] Users should be able to edit msg and the assistant will re-generate the answer based on the edited version of the message.
- [ ] Test for the ability to send different types of messages (e.g., text, emojis, code blocks).
- [ ] Check the output format of the AI (code blocks, JSON, markdown, ...).
- [ ] :key: Validate the scroll functionality in the chat window for lengthy conversations.
- [ ] User can copy / delete the response.
- [ ] :key: Check the `clear message` / `delete entire chat` button works.
- [ ] Deleting all the chat retains the model instruction and settings.
- [ ] :key: Appropriate error handling and messaging if the assistant fails to respond.
- [ ] Test assistant's ability to maintain context over multiple exchanges.
- [ ] :key: Check the `create new chat` button, and new conversation will have an automatically generated thread title based on users msg.
- [ ] Changing `models` mid-thread the app can still handle it.
- [ ] Check the `regenerate` button renews the response (single / multiple times).
- [ ] Check the `Instructions` update correctly after the user updates it midway (mid-thread).
### 2. Users can customize chat settings like model parameters via both the GUI & model.yml
- [ ] Adjust model parameters (e.g., Temperature, Top K, Top P) from the GUI and verify they are reflected in the chat behavior.
- [ ] :key: Changes can be saved and persisted between sessions.
- [ ] Users can access and modify the model.yml file.
- [ ] :key: Changes made in model.yml are correctly applied to the chat session upon reload or restart.
- [ ] Check the maximum and minimum limits of the adjustable parameters and how they affect the assistant's responses.
- [ ] :key: Users switch between threads with different models, the app can handle it.
### 3. Model dropdown
- :key: Model list should highlight recommended based on user RAM (this is not really correct, I think it's based on static formula)
- [ ] Model size should display (for both installed and imported models)
### 4. Users can click on a history thread
- [ ] Chat window displays the entire conversation from the selected history thread without any missing messages.
- [ ] Historical threads reflect the exact state of the chat at that time, including settings.
- [ ] :key: Ability to delete or clean old threads.
- [ ] Changing the title of the thread updates correctly.
### 5. Users can config instructions for the assistant.
- [ ] Instructions set by the user are being followed by the assistant in subsequent conversations.
- [ ] :key: Changes to instructions are updated in real time and do not require a restart of the application or session.
- [ ] :key: Ability to reset instructions to default or clear them completely.
- [ ] :key: RAG - Users can import documents and the system should process queries about the uploaded file, providing accurate and appropriate responses in the conversation thread.
- [ ] :key: Jan can see - Users can import image and Model with vision can generate responses (e.g. LLaVa model). [#294](https://github.com/janhq/jan/issues/294)
## D. Hub
### 1. Users can discover recommended models
- :key: Each model's recommendations are consistent with the users activity and preferences.
- [ ] Search models and verify results / action on the results
### 2. Users can download models suitable for their devices, e.g. compatible with their RAM
- [ ] Model list should be in order: Featured > Remote > Local
- [ ] :key: Ensure that models are labeled with RAM requirements.
- [ ] :key: Check the download model functionality and validate if the cancel download feature works correctly.
### 3. Users can download models via a HuggingFace URL [#1740](https://github.com/janhq/jan/issues/1740)
- [ ] :key: Import via Hugging Face Id / full HuggingFace URL, check the progress bar reflects the download process
- [ ] :key: Test deeplink import [#2876](https://github.com/janhq/jan/issues/2876)
- [ ] :key: Users can use / remove the imported model.
### 4. Users can import new models to the Hub
- [ ] :key: Ensure import successfully via drag / drop or upload GGUF.
- [ ] :key: Verify Move model binary file / Keep Original Files & Symlink option are working
- [ ] Users can add more info to the imported model / edit name
- [ ] :key: Ensure the new model updates after restarting the app.
### 5. Users can use the model as they want
- [ ] :key: Check `start` / `stop` / `delete` button response exactly what it does.
- [ ] Check if starting another model stops the other model entirely.
- [ ] :rocket: Navigate to `hub` > Click `Use` button to use model. Expect to jump to thread and see the model in dropdown model selector.
- [ ] :key: Check when deleting a model it will delete all the files on the user's computer.
- [ ] :warning:The recommended tags should present right for the user's hardware.
### 6. Users can Integrate With a Remote Server
- [ ] :key: Import openAI GPT model https://jan.ai/guides/using-models/integrate-with-remote-server/ and the model displayed in Hub / Thread dropdown
- [ ] Users can use the remote model properly (openAI GPT, Groq)
## E. System Monitor
### 1. Users can see disk and RAM utilization
- [ ] :key: Verify that the RAM and VRAM utilization graphs accurately reported in real time.
- [ ] :key: Validate that the utilization percentages reflect the actual usage compared to the system's total available resources.
- [ ] :key: Ensure that the system monitors updates dynamically as the models run and stop.
### 2. Users can start and stop models based on system health
- [ ] :key: Verify the `Start/Stop` action for a model, the system resource usage reflects this change.
- [ ] Confirm that any changes in model status (start/stop) are logged or reported to the user for transparency.
- [ ] :key: Check the functionality of `App log` to ensure it opens the correct folder in the system file explorer.
## F. Settings
### 1. Appearance
- [ ] :key: Test the `Light`, `Dark`, and `System` theme settings to ensure they are functioning as expected.
- [ ] Confirm that the application saves the theme preference and persists it across sessions.
- [ ] Validate that all elements of the UI are compatible with the theme changes and maintain legibility and contrast.
### 2. Extensions [TBU]
- Validate the `Install Extensions` process by selecting and installing a plugin file.
- [ ] Enable / disable extensions and the UI should reflex accordingly
### 3. Extension group
- [ ] :key: Users can set valid Endpoint and API Key to use remote models
- [ ] Monitoring extension should allow users to enable / disable log and set log Cleaning Interval
### 4. Advanced settings
- [ ] :key: Test the `Experimental Mode` toggle to confirm it enables or disables experimental features as intended.
- [ ] :key: Check the functionality of `Open App Directory` to ensure it opens the correct folder in the system file explorer.
- [ ] Users can move **Jan data folder**
- [ ] Validate that changes in advanced settings are applied immediately or provide appropriate instructions if a restart is needed.
- [ ] Attemp to test downloading model from hub using **HTTP Proxy** [guideline](https://github.com/janhq/jan/pull/1562)
- [ ] Logs that are older than 7 days or exceed 1MB in size will be automatically cleared upon starting the application.
- [ ] Users can click on Reset button to **factory reset** app settings to its original state & delete all usage data.
- [ ] Keep the current app data location
- [ ] Reset the current app data location
- [ ] Users can enable the setting and chat using quick ask.
### 5. Engine
- [ ] :key: TensorRT Engine - Users able to chat with the model
- [ ] :key: Onnx Engine - Users able to chat with the model
- [ ] :key: Other remote Engine - Users able to chat with the model
## G. Local API server
### 1. Local Server Usage with Server Options
- [ ] :key: Explore API Reference: Swagger API for sending/receiving requests
- [ ] Use default server option
- [ ] Configure and use custom server options
- [ ] Test starting/stopping the local API server with different Model/Model settings
- [ ] Server logs captured with correct Server Options provided
- [ ] Verify functionality of Open logs/Clear feature
- [ ] Ensure that threads and other functions impacting the model are disabled while the local server is running

View File

@ -1,19 +0,0 @@
# Jan Improvement Proposals
This is a repo of key architecture decisions for Jan.
[Read more about ADRs](https://github.com/joelparkerhenderson/architecture-decision-record)
### Get started:
```sh
# In root:
make newadr
```
### Template:
- **Status**: `pending`, `approved`, or `rejected`
- **Context**: a clearly defined problem/goal
- **Decisions**: the proposed architecture choices & changes
- **Consequences**: pros and cons of the decision
- **References**: any relevant materials to read

View File

@ -1,54 +0,0 @@
# ADR #001: Jan deployable cloud-native
## Changelog
- 23.10.03: Initial unfinished draft
- 23.10.16: Remove authentication
## Authors
- @nam-john-ho
- @louis
## Context
### Status Quo
* User doesn't have a local GPU machine but wants to run Jan on a rented server
* User wants a quick, fast way to experiment with Jan on a rented GPU
* https://github.com/janhq/jan/issues/255
## Decision
* This ADR aims to outline design decisions for deploying Jan in cloud native environments such as: Runpod, AWS, Azure, GCP in a fast and simple way.
* The current code-base should not change too much.
* The current plugins must be reusable across environments (Desktop, Cloud-native).
### Key Design Decisions
![Key Design](images/adr-001-02.png "Key Design")
#### Why middleware
* The /web codebase needs to operate in both browser and electron environments
* The /web codebase needs to route plugin routes accordingly, either to /server or /electron
* Middleware takes care of this
* We will have a /server codebase that takes care of routing to plugins
#### Unsuitable Alternatives
* Not possible to just run electron headless
* /web is on a different chromium window
* Does not have all the electron handlers
* Does not have the IPC handler
## Alternative Approaches
Separated server process runs along side with electron. https://github.com/janhq/jan/pull/184/commits/6005409a945bb0e80a61132b9eb77f47f19d0aa6
## Considerations
* Due to the limitation of accessing the file system in web browsers, the first version of the web app will load all the current plugins by default, and users will not be able to add, remove, or update plugins.
* Simple authentication will be implemented as a plugin.
## References
- https://www.runpod.io/console/templates
- https://repost.aws/articles/ARQ0Tz9eorSL6EAus7XPMG-Q/how-to-install-textgen-webui-on-aws
- https://www.youtube.com/watch?v=_59AsSyMERQ
- https://gpus.llm-utils.org/running-llama-2-on-runpod-with-oobaboogas-text-generation-webui/
- https://medium.com/@jarimh1984/installing-oobabooga-and-oobabooga-api-to-runpod-cloud-step-by-step-tutorial-47457974dfa5

View File

@ -1,55 +0,0 @@
# ADR #002: Jan AI apps
## Changelog
- Oct 4th 2023: Initial draft
- Oct 6th 2023: Update sample API
## Authors
- @vuonghoainam - Hiro
- @louis-jan
## Status
Proposed
## Context
### Business context
Jan can be a platform and let builders build their own `AI app` using existing tools
- Use-case 1: Medical AI startup uploads "case notes" to Jan, wants to ask it questions (i.e. medical audit)
- Use-case 2: Legal e-discovery: very large amount of documents (~10-15k pages) are uploaded, data is very private and cannot be leaked
- Use-case 3: Jan wants to use Jan to have a QnA chatbot to answer questions on docs
- Use-case 4: Jan wants to use Jan to have a codellama RAG on its own codebase, to generate new PRs
### Extra context
- There are many use cases that the community can develop and sell to the users through Jan as plugin. Jan needs to streamline higher value chain.
- This brings more value and more option to all kind of user
- This can help building ecosystem and streamline value end to end (Jan, plugins/ model creators, Jan users - enterprise/ individual)
- We at Jan cannot build plugins more on our own, but this one should serve as featured example like [OpenAI Retrieval plugin](https://github.com/openai/chatgpt-retrieval-plugin) does.
- [#232](https://github.com/janhq/jan/issues/232)
## Decision
- User can browse and install plugins (with recommended model - llama2, claude, openai …) - This requires plugin dependencies.
- Jan provide consistent interface for plugin developer to use:
- Use LLM (this can be switched in runtime) - i.e Dev in llama2-7b but user can use with llama2-70b. Can choose another model as well
- Plugin can have API for CRUD indices in vectorDB/ DB, and Jan only exposes corresponding data to the app
- A place for a plugin to store the files for persistence
- This works seamlessly on desktop/ Jan hosted version with Jan API abstraction.
### Simple UX
![UX](images/adr-002-01.png "UX")
### Component design
![Component design](images/adr-002-02.png "Component design")
## API
- `jan.plugin.<plugin_name>.<function_name>(**args)`
- `jan.core.db.sql.command()` -> CRUD/ query
- `jan.plugin.vectra.<function_name>(**args)` -> CRUD/ query for
## Consequences
- Jan user can build their own AI apps (and buy from others too) in an easy way
- Clear design for plugin and Jan platform development
## Reference
- [ADR-003](adr-003-jan-plugins.md)

View File

@ -1,65 +0,0 @@
# ADR 003: JAN PLUGINS
## Changelog
- Oct 5th 2023: Initial draft
## Status
Accepted
## Context
Modular Architecture w/ Plugins:
- Jan will have an architecture similar to VSCode or k8Lens
- "Desktop Application" whose functionality can be extended thru plugins
- Jan's architecture will need to accommodate plugins for (a) Persistence(b) IAM(c) Teams and RBAC(d) Policy engines(e) "Apps" (i.e. higher-order business logic)(f) Themes (UI)
- Nitro's architecture will need to accommodate plugins for different "model backends"(a) llama.cpp(b) rkwk (and others)(c) 3rd-party AIs
## Decision
![Architecture](./images/adr-003-01.png)
## Consequences
What becomes easier or more difficult to do because of this change?
## CoreService API
Jan frontend components will communicate with plugin functions via Service Interfaces:
All of the available APIs are listed in [CoreService](../web/shared/coreService.ts)
- Data Service:
- GET_CONVERSATIONS: retrieve all of the conversations
- CREATE_CONVERSATION: start a new conversation
- DELETE_CONVERSATION: delete an existing conversation
- GET_CONVERSATION_MESSAGES: retrieve a certain conversation messages
- CREATE_MESSAGE: store a new message (both sent & received)
- UPDATE_MESSAGE: update an existing message (streaming)
- STORE_MODEL: store new model information (when clicking download)
- UPDATE_FINISHED_DOWNLOAD: mark a model as downloaded
- GET_UNFINISHED_DOWNLOAD_MODELS: retrieve all unfinished downloading model (TBD)
- GET_FINISHED_DOWNLOAD_MODELS: retrieve all finished downloading model (TBD)
- DELETE_DOWNLOAD_MODEL: delete a model (TBD)
- GET_MODEL_BY_ID: retrieve model information by its ID
- Inference Service:
- INFERENCE_URL: retrieve inference endpoint served by plugin
- INIT_MODEL: runs a model
- STOP_MODEL: stop a running model
- Model Management Service: (TBD)
- GET_AVAILABLE_MODELS: retrieve available models (deprecate soon)
- GET_DOWNLOADED_MODELS: (deprecated)
- DELETE_MODEL: (deprecated)
- DOWNLOAD_MODEL: start to download a model
- SEARCH_MODELS: explore models with search query on HuggingFace (TBD)
- Monitoring service:
- GET_RESOURCES_INFORMATION: retrieve total & used memory information
- GET_CURRENT_LOAD_INFORMATION: retrieve CPU load information

View File

@ -1,52 +0,0 @@
# ADR 004: UI Service
## Changelog
- 10 Oct 2023: initial vision @dan-jan @0xSage
## Status
Proposed
## Context
Plugin devs need an API to change the Jan UI. Before we layer on more features, let's ensure good devex for feature building.
## Decision
![Jan UI Framework](./images/jan-ui-framework.png)
- Side-Ribbon: Jan Apps
- This is a protected area, for Apps
- Apps can define Left Panel, Center, and Right Panel
- We will only have 1 App for now (no need to build this abstraction yet)
- Future: Server mode (see LMStudio), Art Studio (Stable Diffusion)
- Side-Ribbon: Global Settings
- These will all open in a modal
- Currently: Model Store, Running Models
- Currently: User Login, Settings
- Main Window and Right Panel
- These will mainly be session-based
- Console: production logs
## UiService API
We need a UI API for Plugins
- e.g. Model Store plugin -> Registers "Global Settings" Icon, defines what will show up in the Modal
- e.g. Model Runner plugin -> Inference Parameters
## Consequences
- Increased code complexity
## Reference
- VSCode
- Obsidian

View File

@ -1,48 +0,0 @@
# ADR 005: model-installation
## Changelog
- 2023-10-18: Initial draft
## Authors
- 0xSage
## Status
Proposed
## Context
There are a few issues with our current model installation method (hardcoding jsons in /models repo):
- Users want to add their own model binaries
- Maintaining /models is too manual
## Decision
Let Users download models on their own & manually import them to Jan via a "add a model" UI
Links:
- Github issue: https://github.com/janhq/jan/issues/359
- Related issue: https://github.com/janhq/jan/issues/304
- Designs: https://www.figma.com/file/JdK7cNIBeVdYeHxKiYeWtk/JAN---Web?type=design&node-id=4092-58218&mode=design&t=8OmFSG0E6I8Y3IjY-0
## Consequences
Closed alternate solutions:
- https://github.com/janhq/jan/issues/328
## Alternatives
Thinking through the model selection experience, there are a few possibilities:
1. [current] We hardcode models (via Github) to show up in Explore Models => unnecessarily manual, missing models users want
1. We mirror HF models for a faster download => users can also do nitro add llama2
1. [CHOSEN] Users download models on their own & manually import them to Jan via a "add a model" UI => I like this option actually
1. [LATER] Users paste in a HF link and download the model in Explore Models => do we still render model cards for them?
1. Users manage their own models folder, e.g. /Users/nicole/models, then they set folder path in Jan. => this one needs a lot of designs/fe work
## Reference

View File

@ -1,36 +0,0 @@
# ADR 006: jan-core-module
## Changelog
- 2023-10-19: Initial draft
## Authors
- Louis
## Status
Accepted
## Context
Currently, developers face several challenges while writing a plugin, which include:
- Registering functions using the function name as a string
- Invoking anonymous functions
- No access to native APIs or common functions for data insertion or retrieval
- Lack of communication between the app and plugins.
## Decision
Let developers install and import an npm module to develop our Plugin easier.
Upon boot, Web plugs in window modules. Its components and plugins can then import the core to access exposed functions.
![Jan Core Module](./images/jan-core-module.png)
## Consequences
Separate PRs should be created for updating the core and app. For instance, if a new app enhancement requires the core module to expose a new API, a new core update must be published on npm to prevent CI failure.
## Alternatives
## Reference

View File

@ -1,35 +0,0 @@
# ADR 007: jan-plugin-catalog
## Changelog
- 2023-10-19: Initial draft
## Authors
- Louis
## Status
Proposed
## Context
Users should be able to explore plugins, and developers need a channel to publish their plugins
Lesson learned from the Model Catalog: we hosted everything on Github and attempted to retrieve it anonymously, which cost us a lot of effort and led to a limit rate issue. Let's say there are N items in the catalog, and we attempted to send N+1 requests at a time. It was costly and led to an API limit rate issue.
## Decision
1. Combine all JSON items in the catalog into one JSON catalog. Now we just need to work with one catalog file, which means only one request, but the rate limit issue still exists.
2. CDN - there are cool services out there which support OSS projects, such as [JSDELIVR](https://www.jsdelivr.com).
3. Downloading a JSON file is not a good approach, though. Exporting a module works better. Webpack + DefinePlugin should work.
4. Since we have created a new module, we want to publish it as well. Let's publish it on npm so everyone can install and use it. This is also to add a versioning feature.
5. Installing this npm module would require the user to update their app to the latest version. Instead, let's import the remote module via CDN, which requires just a few lines of code.
![Jan Plugin Catalog](./images/jan-plugin-catalog.png)
## Consequences
## Alternatives
## Reference

Some files were not shown because too many files have changed in this diff Show More