* Fix: Llama.cpp server hangs on model load
Resolves an issue where the llama.cpp server would hang indefinitely when loading certain models, as described in the attached ticket. The server's readiness message was not being correctly detected, causing the application to stall.
The previous implementation used a line-buffered reader (BufReader::lines()) to process the stderr stream. This method proved to be unreliable for the specific output of the llama.cpp server.
This commit refactors the stderr handling logic to use a more robust, chunk-based approach (read_until(b'\n', ...)). This ensures that the output is processed as it arrives, reliably capturing critical status messages and preventing the application from hanging during model initialization.
Fixes: #6021
* Handle error gracefully with ServerError
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* Revert "Handle error gracefully with ServerError"
This reverts commit 267a8a8a3262fbe36a445a30b8b3ba9a39697643.
* Revert "Fix: Llama.cpp server hangs on model load"
This reverts commit 44e5447f82f0ae32b6db7ffb213025f130d655c4.
* Add more guards, refactor and fix error sending to FE
---------
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
This commit introduces a significant restructuring of the documentation deployment and content strategy to support a gradual migration from Nextra to Astro.
- **New Astro Workflow (`jan-astro-docs.yml`)**: Implemented a new, separate GitHub Actions workflow to build and deploy the Astro site from the `/website` directory to a new subdomain (`v2.jan.ai`). This isolates the new site from the existing one, allowing for independent development and testing.
- **Removed Combined Workflow**: Deleted the previous, more complex combined workflow (`jan-combined-docs.yml`) and its associated test scripts to simplify the deployment process and eliminate routing conflicts.
- **Astro Config Update**: Simplified the Astro configuration (`astro.config.mjs`) by removing the conditional `base` path. The Astro site is now configured to deploy to the root of its own subdomain.
- **Mirrored Content**: Recreated the entire `/products` section from the Astro site within the Nextra site at `/docs/src/pages/products`. This provides content parity and a consistent user experience on both platforms during the transition period.
- **File Structure**: Established a clear, organized structure for platforms, models, and tools within the Nextra `products` directory.
- **Nextra Sidebar Fix**: Implemented the correct `_meta.json` structure for the new products section. Created nested meta files to build a collapsible sidebar, fixing the UI bug that caused duplicated navigation items.
- **"Coming Soon" Pages**: Added clear, concise "Coming Soon" and "In Development" banners and content for upcoming products like Jan V1, Mobile, Server, and native Tools, ensuring consistent messaging across both sites.
- **.gitignore**: Updated the root `.gitignore` to properly exclude build artifacts, caches, and environment files for both the Nextra (`/docs`) and Astro (`/website`) projects.
- **Repository Cleanup**: Removed temporary and unused files related to the previous combined deployment attempt.
This new architecture provides a stable, predictable, and low-risk path for migrating our documentation to Astro while ensuring the current production site remains unaffected.
* fix: generate a response button should appear when an incomplete tool call message is present
* fix: wording
* fix: do not send duplicate messages on regenerating
* fix: tests
* fix: remove CREATE_NEW_PROCESS_GROUP flag for proper Ctrl-C handling
CREATE_NEW_PROCESS_GROUP prevented GenerateConsoleCtrlEvent from working,
causing graceful shutdown failures. Removed to enable proper signal handling.
* Revert "fix: remove CREATE_NEW_PROCESS_GROUP flag for proper Ctrl-C handling"
This reverts commit 82ace3e72e4bf7338f422d5c79bdd6a0f8a2440e.
* fix: use direct process termination instead of console events
Simplified Windows process cleanup by removing console attachment logic
and using direct child.kill() method. More reliable for headless processes.
* Fix missing imports
* switch to tokio::time
* Don't wait while forcefully terminate process using kill API on Windows
Disabled use of windows-sys crate as graceful shutdown on Windows is unreliable in this context.
Updated cleanup.rs and server.rs to directly call child.kill().await for terminating processes on Windows.
Improved logging for process termination and error handling during kill and wait.
Removed timeout-based graceful shutdown attempt on Windows since TerminateProcess is inherently forceful and immediate.
This ensures more predictable process cleanup behavior on Windows platforms.
* final cleanups
This change improves the robustness of the llama.cpp extension's server port selection.
Previously, the `getRandomPort()` method only checked for ports already in use by active sessions, which could lead to model load failures if the chosen port was occupied by another external process.
This change introduces a new Tauri command, `is_port_available`, which performs a system-level check to ensure the randomly selected port is truly free before attempting to start the llama-server. It also adds a retry mechanism with a maximum number of attempts (20,000) to find an available port, throwing an error if no suitable port is found within the specified range after all attempts.
This enhancement prevents port conflicts and improves the reliability and user experience of the llama.cpp extension within Jan.
Closes#5965
* fix: assistant with last used and fix metadata
* chore: revert instruction and desc
* chore: fix current assistant state
* chore: updae metadata message assistant
* chore: update test case
Previously, the `autoUnload` flag was not being updated when set via config,
causing models to be auto-unloaded regardless of the intended behavior.
This patch ensures the setting is respected at runtime.
This commit addresses a race condition where, with "Auto-Unload Old Models" enabled, rapidly attempting to load multiple models could result in more than one model being loaded simultaneously.
Previously, the unloading logic did not account for models that were still in the process of loading when a new load operation was initiated. This allowed new models to start loading before the previous ones had fully completed their unload cycle.
To resolve this:
- A `loadingModels` map has been introduced to track promises for models currently in the loading state.
- The `load` method now checks if a model is already being loaded and, if so, returns the existing promise, preventing duplicate load operations for the same model.
- The `performLoad` method (which encapsulates the actual loading logic) now ensures that when `autoUnload` is active, it waits for any *other* models that are concurrently loading to finish before proceeding to unload all currently loaded models. This guarantees that the auto-unload mechanism properly unloads all models, including those initiated in quick succession, thereby preventing the race condition.
This fixes the issue where clicking the start button very fast on multiple models would bypass the auto-unload functionality.