feat: add recommended backend testcase
This commit is contained in:
parent
054f64bd54
commit
703395ae10
60
autoqa/tested/models/recommended-backend.txt
Normal file
60
autoqa/tested/models/recommended-backend.txt
Normal file
@ -0,0 +1,60 @@
|
||||
prompt = """
|
||||
|
||||
You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task.
|
||||
|
||||
## Output Format
|
||||
```\nThought: ...
|
||||
Action: ...\n```
|
||||
|
||||
## Action Space
|
||||
|
||||
click(start_box='<|box_start|>(x1,y1)<|box_end|>')
|
||||
left_double(start_box='<|box_start|>(x1,y1)<|box_end|>')
|
||||
right_single(start_box='<|box_start|>(x1,y1)<|box_end|>')
|
||||
drag(start_box='<|box_start|>(x1,y1)<|box_end|>', end_box='<|box_start|>(x3,y3)<|box_end|>')
|
||||
hotkey(key='')
|
||||
type(content='') #If you want to submit your input, use \"\
|
||||
\" at the end of `content`.
|
||||
scroll(start_box='<|box_start|>(x1,y1)<|box_end|>', direction='down or up or right or left')
|
||||
wait() #Sleep for 5s and take a screenshot to check for any changes.
|
||||
finished()
|
||||
call_user() # Submit the task and call the user when the task is unsolvable, or when you need the user's help.
|
||||
|
||||
|
||||
## Note
|
||||
- Use Chinese in `Thought` part.
|
||||
- Summarize your next action (with its target element) in one sentence in `Thought` part.
|
||||
|
||||
## User Instruction
|
||||
|
||||
You are going to verify that **Llama.cpp shows the recommended version & backend description** under Model Providers in Settings.
|
||||
|
||||
Steps:
|
||||
1. If a dialog appears in the bottom-right corner titled **"Help Us Improve Jan"**, click **Deny** to dismiss it before continuing.
|
||||
2. In the bottom-left menu, click **Settings**.
|
||||
3. In the left sidebar, click **Model Providers**.
|
||||
4. In the left sidebar menu, under **Model Providers**, click on **Llama.cpp**.
|
||||
- Make sure to click the one in the **sidebar**, not the entry in the main panel.
|
||||
- Click directly in the center of the "Llama.cpp" text label in the sidebar to open its configuration page.
|
||||
5. In the **Version & Backend** section, check the description.
|
||||
|
||||
Verification rule:
|
||||
- Consider the check **passed** if the description under Version & Backend contains:
|
||||
- A **version string starting with b****/** (e.g., `b6097/win-avx2-cuda-cu12.0-x64`, `b6097/win-avx2-x64`, or `b6097/win-vulkan-x64`),
|
||||
- Followed by the text **"Version and backend is the recommended backend"**.
|
||||
- The exact version (e.g., b6097, b5857, b5833, etc.) may vary — any valid build number is acceptable as long as the description includes the phrase above.
|
||||
- If this text is missing or different, the check **fails**.
|
||||
|
||||
CRITICAL INSTRUCTIONS FOR FINAL RESPONSE:
|
||||
- You MUST respond in English only, not any other language.
|
||||
- You MUST return ONLY the JSON format below, nothing else.
|
||||
- Do NOT add any explanations, thoughts, or additional text.
|
||||
|
||||
If the description is exactly as expected, return: {"result": True}.
|
||||
Otherwise, return: {"result": False}.
|
||||
|
||||
IMPORTANT:
|
||||
- Your response must be ONLY the JSON above.
|
||||
- Do NOT add any other text before or after the JSON.
|
||||
|
||||
"""
|
||||
@ -88,7 +88,7 @@ In `Llama.cpp`:
|
||||
- [x] Disable `Auto-Unload Old Models`, and ensure that multiple models can run at the same time.
|
||||
- [x] Enable `Context Shift` and ensure that context can run for long without encountering memory error. Use the `banana test` by turn on fetch MCP => ask local model to fetch and summarize the history of banana (banana has a very long history on wiki it turns out). It should run out of context memory sufficiently fast if `Context Shift` is not enabled.
|
||||
- [x] [New] Ensure that user can change the Jinja chat template of individual model and it doesn't affect the template of other model
|
||||
- [x] [New] Ensure that there is a recommended `llama.cpp` for each system and that it works out of the box for users. 🔥
|
||||
- [x] [New] Ensure that there is a recommended `llama.cpp` for each system and that it works out of the box for users. ✅
|
||||
|
||||
In Remote Model Providers:
|
||||
- [x] Check that the following providers are presence:
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user