jan/docs/src/pages/api-reference/architecture.mdx
2025-09-21 19:24:48 +08:00

192 lines
5.8 KiB
Plaintext

---
title: Architecture
description: Technical architecture and system design of Jan Server components.
---
## System Overview
Jan Server implements a microservices architecture on Kubernetes with three core components communicating over HTTP and managed by Helm charts.
```mermaid
graph TD
Client[Client/Browser] --> Gateway[jan-api-gateway:8080]
Gateway --> Model[jan-inference-model:8101]
Gateway --> DB[(PostgreSQL:5432)]
Gateway --> Serper[Serper API]
Gateway --> OAuth[Google OAuth2]
```
## Components
### API Gateway (`jan-api-gateway`)
**Technology Stack:**
- **Language**: Go 1.24.6
- **Framework**: Gin web framework
- **ORM**: GORM with PostgreSQL driver
- **DI**: Google Wire for dependency injection
- **Documentation**: Swagger/OpenAPI auto-generated
**Responsibilities:**
- HTTP request routing and middleware
- User authentication via JWT and OAuth2
- Database operations and data persistence
- External API integration (Serper, Google OAuth)
- OpenAI-compatible API endpoints
- Request forwarding to inference service
**Key Directories:**
```
application/
├── cmd/server/ # Main entry point and DI wiring
├── app/ # Core business logic
├── config/ # Environment variables and settings
└── docs/ # Auto-generated Swagger docs
```
### Inference Model (`jan-inference-model`)
**Technology Stack:**
- **Base Image**: VLLM OpenAI v0.10.0
- **Model**: Jan-v1-4B (downloaded from Hugging Face)
- **Protocol**: OpenAI-compatible HTTP API
- **Features**: Tool calling, reasoning parsing
**Configuration:**
- **Model Path**: `/models/Jan-v1-4B`
- **Served Name**: `jan-v1-4b`
- **Port**: 8101
- **Batch Tokens**: 1024 max
- **Tool Parser**: Hermes
- **Reasoning Parser**: Qwen3
**Capabilities:**
- Text generation and completion
- Tool calling and function execution
- Multi-turn conversations
- Reasoning and chain-of-thought
### Database (PostgreSQL)
**Configuration:**
- **Database**: `jan`
- **User**: `jan-user`
- **Password**: `jan-password`
- **Port**: 5432
**Schema:**
- User accounts and authentication
- Conversation history
- Project and organization management
- API keys and access control
## Data Flow
### Request Processing
1. **Client Request**: HTTP request to API gateway on port 8080
2. **Authentication**: JWT token validation or OAuth2 flow
3. **Request Routing**: Gateway routes to appropriate handler
4. **Database Operations**: GORM queries for user data/state
5. **Inference Call**: HTTP request to model service on port 8101
6. **Response Assembly**: Gateway combines results and returns to client
### Authentication Flow
**JWT Authentication:**
1. User provides credentials
2. Gateway validates against database
3. JWT token issued with HMAC-SHA256 signing
4. Subsequent requests include JWT in Authorization header
**OAuth2 Flow:**
1. Client redirected to Google OAuth2
2. Authorization code returned to redirect URL
3. Gateway exchanges code for access token
4. User profile retrieved from Google
5. Local JWT token issued
## Deployment Architecture
### Kubernetes Resources
**Deployments:**
- `jan-api-gateway`: Single replica Go application
- `jan-inference-model`: Single replica VLLM server
- `postgresql`: StatefulSet with persistent storage
**Services:**
- `jan-api-gateway`: ClusterIP exposing port 8080
- `jan-inference-model`: ClusterIP exposing port 8101
- `postgresql`: ClusterIP exposing port 5432
**Configuration:**
- Environment variables via Helm values
- Secrets for sensitive data (JWT keys, OAuth credentials)
- ConfigMaps for application settings
### Helm Chart Structure
```
charts/
├── umbrella-chart/ # Main deployment chart
│ ├── Chart.yaml
│ ├── values.yaml # Configuration values
│ └── Chart.lock
└── apps-charts/ # Individual service charts
├── jan-api-gateway/
└── jan-inference-model/
```
## Security Architecture
### Authentication Methods
- **JWT Tokens**: HMAC-SHA256 signed tokens for API access
- **OAuth2**: Google OAuth2 integration for user login
- **API Keys**: HMAC-SHA256 signed keys for service access
### Network Security
- **Internal Communication**: Services communicate over Kubernetes cluster network
- **External Access**: Only API gateway exposed via port forwarding or ingress
- **Database Access**: PostgreSQL accessible only within cluster
### Data Security
- **Secrets Management**: Kubernetes secrets for sensitive configuration
- **Environment Variables**: Non-sensitive config via environment variables
- **Database Encryption**: Standard PostgreSQL encryption at rest
Production deployments should implement additional security measures including TLS termination, network policies, and secret rotation.
## Scalability Considerations
**Current Limitations:**
- Single replica deployments
- No horizontal pod autoscaling
- Local storage for database
**Future Enhancements:**
- Multi-replica API gateway with load balancing
- Horizontal pod autoscaling based on CPU/memory
- External database with clustering
- Redis caching layer
- Message queue for async processing
## Development Architecture
### Code Generation
- **Swagger**: API documentation generated from Go annotations
- **Wire**: Dependency injection code generated from providers
- **GORM Gen**: Database model generation from schema
### Build Process
1. **API Gateway**: Multi-stage Docker build with Go compilation
2. **Inference Model**: Base VLLM image with model download
3. **Helm Charts**: Dependency management and templating
4. **Documentation**: Auto-generation during development
### Local Development
- **Hot Reload**: Source code changes reflected without full rebuild
- **Database Migrations**: Automated schema updates
- **API Testing**: Swagger UI for interactive testing
- **Logging**: Structured logging with configurable levels