Quality Assurance (QA)
The QA tab is your dedicated environment for evaluating and validating your application’s performance using defined test cases. It allows you to create, manage, and run Evaluation Datasets against specific versions of your agent or workflow to ensure that changes and updates maintain the expected quality and accuracy.
This feature is crucial for effective regression testing and for monitoring the impact of new application versions before they reach production.
Key Concepts
Datasets
A Dataset is a structured, reusable collection of test cases used to assess the model’s behavior under specific conditions. Datasets are scoped to a specific application, feature, or use-case, allowing for highly targeted testing.
Key features:
- Create multiple datasets for a single workflow/agent
- Choose which version the dataset runs on
- Enable direct comparison between a new Draft and a proven stable version
Dataset Management Actions:
- — Define and name a new collection of test cases
- Delete Dataset — Permanently remove the currently selected dataset
⚠️ Warning: Deleting a dataset is irreversible. The dataset and all related tests will be permanently deleted.
Test Cases
A test case consists of the minimum data required to execute an evaluation. You can add multiple tests to a single dataset and run one or several tests simultaneously.
Test Case Fields:
| Field | Description | Required |
|---|---|---|
| INPUT | The query, prompt, or data sent to the model (e.g., a user request) | ✓ |
| EXPECTED OUTPUT | The desired or correct response the model should generate | Optional |
| ACTUAL OUTPUT | The response the current selected model version produced | Auto-filled |
| STATUS | The run status (e.g., Pending or Run) | Auto-filled |
Working with Test Cases
Adding Test Cases
Once a dataset is selected, you can populate it using the following methods:
- Click the button in the bottom-right corner of the table
- A new empty row will be appended
- Input Data:
- In the INPUT column, enter the test prompt or data
- In the EXPECTED OUTPUT column, enter the precise, desired, and validated response that the system should produce
Running Test Cases
After defining your test cases, you can execute the evaluation on the selected Version. The system will update the ACTUAL OUTPUT and STATUS columns upon completion.
Execution Options:
Run one test:
Click the button in the Action column of the desired test
Run multiple tests:
Select multiple tests using the checkboxes on the left column, then press the button
Run all tests:
Click the button to execute every test case in the currently active dataset
Deleting Test Cases
You have several options to delete test cases:
- Delete one test: Click the button in the Action column
- Delete multiple tests: Select multiple tests using the checkboxes, then press the button
⚠️ Warning: Deleting a test is irreversible. The test will be permanently deleted.
Best Practices
- Create separate datasets for different features or use cases
- Use meaningful names for your datasets to easily identify their purpose
- Regularly update your test cases to reflect changes in expected behavior
- Compare results between versions before deploying to production
- Keep your expected outputs up-to-date with your product requirements
Ready to build production AI?
Start with the Quick Start guide or explore the API. Join the community and help shape the open-source platform for AI agents.