Quality Assurance

Quality Assurance (QA)

The QA tab is your dedicated environment for evaluating and validating your application’s performance using defined test cases. It allows you to create, manage, and run Evaluation Datasets against specific versions of your agent or workflow to ensure that changes and updates maintain the expected quality and accuracy.

This feature is crucial for effective regression testing and for monitoring the impact of new application versions before they reach production.

Key Concepts

Datasets

A Dataset is a structured, reusable collection of test cases used to assess the model’s behavior under specific conditions. Datasets are scoped to a specific application, feature, or use-case, allowing for highly targeted testing.

Key features:

Create multiple datasets for a single workflow/agent
Choose which version the dataset runs on
Enable direct comparison between a new Draft and a proven stable version

Dataset Management Actions:

— Define and name a new collection of test cases
Delete Dataset — Permanently remove the currently selected dataset

⚠️ Warning: Deleting a dataset is irreversible. The dataset and all related tests will be permanently deleted.

Test Cases

A test case consists of the minimum data required to execute an evaluation. You can add multiple tests to a single dataset and run one or several tests simultaneously.

Test Case Fields:

Field	Description	Required
INPUT	The query, prompt, or data sent to the model (e.g., a user request)	✓
EXPECTED OUTPUT	The desired or correct response the model should generate	Optional
ACTUAL OUTPUT	The response the current selected model version produced	Auto-filled
STATUS	The run status (e.g., Pending or Run)	Auto-filled

Working with Test Cases

Adding Test Cases

Once a dataset is selected, you can populate it using the following methods:

Click the button in the bottom-right corner of the table
A new empty row will be appended
Input Data:
- In the INPUT column, enter the test prompt or data
- In the EXPECTED OUTPUT column, enter the precise, desired, and validated response that the system should produce

Running Test Cases

After defining your test cases, you can execute the evaluation on the selected Version. The system will update the ACTUAL OUTPUT and STATUS columns upon completion.

Execution Options:

Run one test:
Click the button in the Action column of the desired test

Run multiple tests:
Select multiple tests using the checkboxes on the left column, then press the button

Run all tests:
Click the button to execute every test case in the currently active dataset

Deleting Test Cases

You have several options to delete test cases:

Delete one test: Click the button in the Action column
Delete multiple tests: Select multiple tests using the checkboxes, then press the button

⚠️ Warning: Deleting a test is irreversible. The test will be permanently deleted.

Best Practices

Create separate datasets for different features or use cases
Use meaningful names for your datasets to easily identify their purpose
Regularly update your test cases to reflect changes in expected behavior
Compare results between versions before deploying to production
Keep your expected outputs up-to-date with your product requirements

Quick Start tutorial

Ready to build production AI?

Start with the Quick Start guide or explore the API. Join the community and help shape the open-source platform for AI agents.