This commit introduces embedding functionality to the llamacpp extension. It allows users to generate embeddings for text inputs using the 'sentence-transformer-mini' model. The changes include:
- Adding a new `embed` method to the `llamacpp_extension` class.
- Implementing model loading and API interaction for embeddings.
- Handling potential errors during API requests.
- Adding necessary types for embedding responses and data.
- The load method now accepts a boolean parameter to determine if it should load embedding model.
The changes include:
- Renaming interfaces (sessionInfo -> SessionInfo, unloadResult -> UnloadResult) for consistency
- Adding getLoadedModels() method to retrieve active model IDs
- Updating variable names from modelId to model_id for alignment
- Updating cleanup paths to use XDG-standard locations
- Improving type consistency across extension implementation
Rename variable, struct, and enum names from camelCase to snake_case throughout the llamacpp extension codebase to align with Rust naming conventions. This change improves readability and consistency without altering functionality.
Add comprehensive sampling parameters for fine-grained control over AI output generation, including dynamic temperature, Mirostat sampling, repetition penalties, and advanced prompt handling. These parameters enable more precise tuning of model behavior and output quality.
Change the llama_server_process state from an Option<Child> to a HashMap<String, Child> to support managing multiple server instances by PID. This allows precise process tracking and termination, replacing the previous single-process limitation.
Previously, only one server process could be tracked at a time. Now, each process is stored with its PID as the key, enabling:
- Accurate session matching during unloading
- Proper termination of specific processes
- Better error handling for mismatched PIDs
The load_llama_model function now inserts processes into the map, and unload_llama_model removes them by PID.
The loop now extracts session info to retrieve the model ID, ensuring correct unloading of sessions by their associated model identifiers rather than session IDs. This aligns the cleanup process with the actual model resources being managed.
The generateApiKey method now incorporates the model's port to create a unique,
port-specific API key, enhancing security by ensuring keys are tied to both
model ID and port. This change supports better isolation between models
running on different ports. Code formatting improvements were also made
for consistency and readability.
The changes standardize identifier names across the codebase for clarity:
- Replaced `sessionId` with `pid` to reflect process ID usage
- Changed `modelName` to `modelId` for consistency with identifier naming
- Renamed `api_key` to `apiKey` for camelCase consistency
- Updated corresponding methods to use these new identifiers
- Improved type safety and readability by aligning variable names with their semantic meaning
This change allows the port to be specified via command line arguments, providing flexibility. The port is parsed from the arguments, defaulting to 8080 if not provided.
The changes improve the robustness of command-line argument parsing in the Llama model server by replacing direct index access with safe iteration methods. A new generate_api_key function was added to handle API key generation securely. The sessionId parameter was standardized to match the renamed property in the client code.
- Changed load method to accept modelId instead of loadOptions for better clarity and simplicity
- Renamed engineBasePath parameter to backendPath for consistency with the backend's directory structure
- Added getRandomPort method to ensure unique ports for each session to prevent conflicts
- Refactored configuration and model loading logic to improve maintainability and reduce redundancy