Type something to search...

Quality Assurance (QA)

The QA tab is your dedicated environment for evaluating and validating your application’s performance using defined test cases. It allows you to create, manage, and run Evaluation Datasets against specific versions of your agent or workflow to ensure that changes and updates maintain the expected quality and accuracy.

This feature is crucial for effective regression testing and for monitoring the impact of new application versions before they reach production.

Key Concepts

Datasets

A Dataset is a structured, reusable collection of test cases used to assess the model’s behavior under specific conditions. Datasets are scoped to a specific application, feature, or use-case, allowing for highly targeted testing.

Key features:

  • Create multiple datasets for a single workflow/agent
  • Choose which version the dataset runs on
  • Enable direct comparison between a new Draft and a proven stable version

Dataset Management Actions:

  • — Define and name a new collection of test cases
  • Delete Dataset — Permanently remove the currently selected dataset

⚠️ Warning: Deleting a dataset is irreversible. The dataset and all related tests will be permanently deleted.

Test Cases

A test case consists of the minimum data required to execute an evaluation. You can add multiple tests to a single dataset and run one or several tests simultaneously.

Test Case Fields:

FieldDescriptionRequired
INPUTThe query, prompt, or data sent to the model (e.g., a user request)
EXPECTED OUTPUTThe desired or correct response the model should generateOptional
ACTUAL OUTPUTThe response the current selected model version producedAuto-filled
STATUSThe run status (e.g., Pending or Run)Auto-filled

Working with Test Cases

Adding Test Cases

Once a dataset is selected, you can populate it using the following methods:

  1. Click the button in the bottom-right corner of the table
  2. A new empty row will be appended
  3. Input Data:
    • In the INPUT column, enter the test prompt or data
    • In the EXPECTED OUTPUT column, enter the precise, desired, and validated response that the system should produce

Running Test Cases

After defining your test cases, you can execute the evaluation on the selected Version. The system will update the ACTUAL OUTPUT and STATUS columns upon completion.

Execution Options:

Run one test:
Click the button in the Action column of the desired test

Run multiple tests:
Select multiple tests using the checkboxes on the left column, then press the button

Run all tests:
Click the button to execute every test case in the currently active dataset

Deleting Test Cases

You have several options to delete test cases:

  • Delete one test: Click the button in the Action column
  • Delete multiple tests: Select multiple tests using the checkboxes, then press the button

⚠️ Warning: Deleting a test is irreversible. The test will be permanently deleted.

Best Practices

  • Create separate datasets for different features or use cases
  • Use meaningful names for your datasets to easily identify their purpose
  • Regularly update your test cases to reflect changes in expected behavior
  • Compare results between versions before deploying to production
  • Keep your expected outputs up-to-date with your product requirements
Quick Start tutorial

Ready to build production AI?

Start with the Quick Start guide or explore the API. Join the community and help shape the open-source platform for AI agents.