feat: add recommended backend testcase

2025-08-22 12:47:02 +07:00 · 2025-08-22 12:47:02 +07:00 · 703395ae10
commit 703395ae10
parent 054f64bd54
2 changed files with 61 additions and 1 deletions
--- a/autoqa/tested/models/recommended-backend.txt
+++ b/autoqa/tested/models/recommended-backend.txt
@ -0,0 +1,60 @@
+prompt = """
+
+You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task. 
+
+## Output Format
+```\nThought: ...
+Action: ...\n```
+
+## Action Space
+
+click(start_box='<|box_start|>(x1,y1)<|box_end|>')
+left_double(start_box='<|box_start|>(x1,y1)<|box_end|>')
+right_single(start_box='<|box_start|>(x1,y1)<|box_end|>')
+drag(start_box='<|box_start|>(x1,y1)<|box_end|>', end_box='<|box_start|>(x3,y3)<|box_end|>')
+hotkey(key='')
+type(content='') #If you want to submit your input, use \"\
+\" at the end of `content`.
+scroll(start_box='<|box_start|>(x1,y1)<|box_end|>', direction='down or up or right or left')
+wait() #Sleep for 5s and take a screenshot to check for any changes.
+finished()
+call_user() # Submit the task and call the user when the task is unsolvable, or when you need the user's help.
+
+
+## Note
+- Use Chinese in `Thought` part.
+- Summarize your next action (with its target element) in one sentence in `Thought` part.
+
+## User Instruction
+
+You are going to verify that **Llama.cpp shows the recommended version & backend description** under Model Providers in Settings.
+
+Steps:
+1. If a dialog appears in the bottom-right corner titled **"Help Us Improve Jan"**, click **Deny** to dismiss it before continuing.
+2. In the bottom-left menu, click **Settings**.
+3. In the left sidebar, click **Model Providers**.
+4. In the left sidebar menu, under **Model Providers**, click on **Llama.cpp**.  
+   - Make sure to click the one in the **sidebar**, not the entry in the main panel.  
+   - Click directly in the center of the "Llama.cpp" text label in the sidebar to open its configuration page.
+5. In the **Version & Backend** section, check the description.
+
+Verification rule:
+- Consider the check **passed** if the description under Version & Backend contains:
+  - A **version string starting with b****/** (e.g., `b6097/win-avx2-cuda-cu12.0-x64`, `b6097/win-avx2-x64`, or `b6097/win-vulkan-x64`),  
+  - Followed by the text **"Version and backend is the recommended backend"**.
+- The exact version (e.g., b6097, b5857, b5833, etc.) may vary — any valid build number is acceptable as long as the description includes the phrase above.
+- If this text is missing or different, the check **fails**.
+
+CRITICAL INSTRUCTIONS FOR FINAL RESPONSE:
+- You MUST respond in English only, not any other language.
+- You MUST return ONLY the JSON format below, nothing else.
+- Do NOT add any explanations, thoughts, or additional text.
+
+   If the description is exactly as expected, return: {"result": True}.
+   Otherwise, return: {"result": False}.
+
+IMPORTANT:
+- Your response must be ONLY the JSON above.
+- Do NOT add any other text before or after the JSON.
+
+"""
--- a/autoqa/windows-qa-checklist.md
+++ b/autoqa/windows-qa-checklist.md
@ -88,7 +88,7 @@ In `Llama.cpp`:
 - [x] Disable `Auto-Unload Old Models`, and ensure that multiple models can run at the same time.
 - [x] Enable  `Context Shift` and ensure that context can run for long without encountering memory error. Use the `banana test` by turn on fetch MCP => ask local model to fetch and summarize the history of banana (banana has a very long history on wiki it turns out). It should run out of context memory sufficiently fast if `Context Shift` is not enabled.
 - [x] [New] Ensure that user can change the Jinja chat template of individual model and it doesn't affect the template of other model
- [x] [New] Ensure that there is a recommended `llama.cpp` for each system and that it works out of the box for users. 🔥
+- [x] [New] Ensure that there is a recommended `llama.cpp` for each system and that it works out of the box for users. ✅

 In Remote Model Providers:
 - [x] Check that the following providers are presence: