Skip to main content

Test Cases

The Test Cases section allows you to manage and evaluate your AI assistant’s responses systematically. By running test cases, you can ensure the assistant is providing accurate, relevant, and high-quality answers to user queries. This section also provides detailed insights into the assistant’s performance across various metrics.

Test Cases Tab

What are Test Cases?

Test cases consist of a message (query or input) and an expected response. These allow you to simulate real-world scenarios and assess how well the assistant performs. Test cases can include:

  • Training Data Test Cases: Automatically generated for all questions in the training data.
  • Custom Test Cases: Manually added for critical or edge-case queries specific to your application.

What happens when a Test Case is Run?

  1. Message: The test case message is sent to the assistant.
  2. Assistant Response: The assistant generates a response, which is saved for evaluation.
  3. Evaluation: A separate AI model analyzes the assistant’s response by comparing it to the expected response across several metrics:
    • Similarity: Measures how close the meaning of the actual response is to the expected response.
    • Relevance: Assesses how relevant the response is to the query.
    • Completeness: Evaluates how much of the expected response was covered.
    • Error Handling: Rates how well the assistant handles vague or incomplete questions.
    • Hallucination: Scores how accurately the assistant avoids generating false or irrelevant information (a higher score indicates less hallucination).

The individual metric scores are combined to give a Total Score for each response. These scores are aggregated across all test cases to provide an Overall Test Score.

Handling Vague Questions

If a test case contains a vague or incomplete query, its score will not count toward the overall test score. Instead, the system tracks a Vague Count, showing the number of test cases flagged as vague.

What to do after all Test Cases are Run?

After running test cases, you can review:

  • Individual Test Case Scores: View detailed scores for each test case, including breakdowns by metric.
  • Overall Test Score: Assess the assistant’s performance across all non-vague test cases.
  • Vague Count: Identify and refine vague or incomplete test cases for more accurate evaluation.

Add a Test Case

  1. Navigate to the Test Cases tab in the Evaluation section.

  2. Click on the Plus icon in the top-left corner to add a new test case.

    Add Test Case Button

  3. Fill in the Message and Expected Response fields.

    Add Test Case Form

  4. Click Save to create the test case. A success notification will confirm that the test case was successfully created and the test case will be added to the Test Cases tab.

    Add Test Case Success Message

    Add Test Case Added

Edit a Test Case

  1. Navigate to the Test Cases tab in the Evaluation section.

  2. Locate the test case you want to edit.

  3. Click on the Pencil icon on the top right of the test case.

    Edit Test Case Button

  4. Update the test case and click Save. A success notification will confirm that the test case was updated.

    Edit Test Case Form

    Edit Test Case Success Message

Delete a Test Case

  1. Navigate to the Test Cases tab in the Evaluation section.

  2. Locate the test case you want to delete.

  3. Click on the Trash icon on the top right of the test case.

    Delete Test Case Button

  4. Confirm the deletion in the modal that appears. A success notification will confirm that the test case was deleted.

    Delete Test Case Modal

    Delete Test Case Success Message

Running Test Cases

  1. Navigate to the Playground tab in the Evaluation section.

    Playground Tab

  2. Select the model you want to run the test cases on.

    Playground Model

  3. Navigate to the Tests tab.

    Tests Tab

  4. Choose the test set you want to run the test cases on.

    Test Set Options

  5. Click the Run Test Cases button to run all test cases in current version.

  6. The test cases will be executed, and the results will be displayed in the Test Cases tab. While the test cases are running, the test run will show a loading animation on the left.

    Test Cases Running

Once completed, the results will be displayed in the collapsed test case run.

Evaluation Result

tip

When evaluating using All Knowledge, a total score above 85 is considered good.

You also have the option to view each test case, it's response and evaluation metrics within each testcase. Click on the View Test Case Responses button to see the individual test case results.

Test Case Result

Test set options

  • All Knowledge: Run test cases for all knowledge in the project.
  • All Test Cases: Run test cases for all test cases in the current version.
  • All Knowledge and Test Cases: Run test cases for all knowledge and test cases in the current version.

Best Practices for Test Cases

  • Comprehensive Coverage: Include test cases that address both common and edge-case scenarios.
  • Realistic Queries: Use real-world examples to ensure the assistant is tested in practical situations.
  • Iterative Testing: Regularly run and update test cases as you refine your assistant’s knowledge and training data.

Next Steps

While the test case evaluation is in-progress, you can skip to the Playground section to see the assistant responses in action.