Testing Utilities
Testing utilities, test runners, and debugging helpers.
TestCase
Section titled “TestCase”Bundle a trace with a set of checks to execute.
Module: giskard.checks.core.testcase
Description
Section titled “Description”TestCase combines a trace (with pre-recorded interactions) and a sequence of checks to run against that trace. This is useful for testing against fixed interaction sequences or replaying recorded conversations.
Attributes
Section titled “Attributes”| Attribute | Type | Description |
|---|---|---|
name | str | None | Optional label for the test case |
trace | Trace | The trace containing interactions to test against |
checks | Sequence[Check] | Sequence of checks to run against the trace |
Creating Test Cases
Section titled “Creating Test Cases”from giskard.checks import TestCase, Trace, Interaction, Equals
# Create a trace with interactionstrace = Trace(interactions=[ Interaction(inputs="Hello", outputs="Hi there!"), Interaction(inputs="How are you?", outputs="I'm doing well!"),])
# Create test case with checkstest_case = TestCase( name="greeting_test", trace=trace, checks=[ Equals(expected="Hi there!", key="trace.interactions[0].outputs"), Equals(expected="I'm doing well!", key="trace.interactions[1].outputs"), ])
# Run the test caseresult = await test_case.run()Methods
Section titled “Methods”run(return_exception=False) -> TestCaseResult
Section titled “run(return_exception=False) -> TestCaseResult”Execute all checks against the trace.
Parameters:
return_exception(bool): If True, return results even when exceptions occur instead of raising
Returns: TestCaseResult with check outcomes
result = await test_case.run()
print(f"Status: {result.status}")print(f"Passed: {result.passed}")print(f"Failed: {result.failed}")assert_passed()
Section titled “assert_passed()”Run the test case and assert that it passed.
Raises:
AssertionError: If the test case did not pass, with formatted failure messages
# Use in tests - raises if any check failsawait test_case.assert_passed()
# Equivalent to:result = await test_case.run()result.assert_passed()TestCaseResult
Section titled “TestCaseResult”Result of test case execution with check outcomes.
Module: giskard.checks.core.result
Attributes
Section titled “Attributes”| Attribute | Type | Description |
|---|---|---|
status | CheckStatus | Overall test case status (PASS/FAIL/ERROR) |
trace | Trace | The trace that was tested |
check_results | list[CheckResult] | Results from all checks |
passed | bool | True if all checks passed |
failed | bool | True if any check failed |
message | str | None | Optional summary message |
details | dict[str, Any] | Additional execution details |
Accessing Results
Section titled “Accessing Results”result = await test_case.run()
# Check overall statusif result.passed: print("✓ All checks passed!")else: print(f"✗ Test failed: {result.message}")
# Review individual check resultsfor i, check_result in enumerate(result.check_results): status_icon = "✓" if check_result.passed else "✗" print(f"{status_icon} Check {i}: {check_result.message}")
# Assert passed (raises if failed)result.assert_passed()TestCaseRunner
Section titled “TestCaseRunner”Low-level runner for executing test cases.
Module: giskard.checks.testing.runner
Description
Section titled “Description”TestCaseRunner provides the execution engine for running test cases. Most users should use test_case.run() rather than using TestCaseRunner directly.
Methods
Section titled “Methods”run(test_case, return_exception=False) -> TestCaseResult
Section titled “run(test_case, return_exception=False) -> TestCaseResult”Execute a test case’s checks against its trace.
Parameters:
test_case(TestCase): The test case to executereturn_exception(bool): If True, return results even when exceptions occur
Returns: TestCaseResult with check outcomes
from giskard.checks.testing.runner import TestCaseRunner
runner = TestCaseRunner()result = await runner.run(test_case)get_runner()
Section titled “get_runner()”Get the default process-wide TestCaseRunner instance.
Module: giskard.checks.testing.runner
from giskard.checks.testing.runner import get_runner
runner = get_runner()result = await runner.run(test_case)WithSpy
Section titled “WithSpy”Spy on function calls during interaction generation for debugging.
Module: giskard.checks.testing.spy
Description
Section titled “Description”WithSpy wraps an interaction spec and records all function calls made during interaction generation. This is useful for debugging complex interaction specs or understanding what’s happening during scenario execution.
Attributes
Section titled “Attributes”| Attribute | Type | Description |
|---|---|---|
interaction_generator | BaseInteractionSpec | The interaction spec to spy on |
target | str | JSONPath to the value to spy on |
Using WithSpy
Section titled “Using WithSpy”from giskard.checks import scenario, InteractionSpecfrom giskard.checks.testing.spy import WithSpy
# Create an interaction spec to spy oninteraction_spec = InteractionSpec( inputs=lambda trace: f"Context: {trace.last.outputs if trace.last else 'None'}", outputs=lambda inputs: my_model(inputs))
# Wrap with spyspied_spec = WithSpy( interaction_generator=interaction_spec, target="trace.last.outputs")
# Use in scenariotest_scenario = ( scenario("debug_test") .add_spec(spied_spec))
result = await test_scenario.run()
# Access spy dataprint("Function calls recorded:")print(result.details.get("spy_data"))Debugging Scenarios
Section titled “Debugging Scenarios”from giskard.checks import scenario, from_fnfrom giskard.checks.testing.spy import WithSpy
# Spy on a complex callabledef complex_output_generator(inputs): # Complex logic processed = process_input(inputs) enriched = enrich_data(processed) return generate_response(enriched)
# Create spied interactionspied_interaction = WithSpy( interaction_generator=InteractionSpec( inputs="test input", outputs=complex_output_generator ), target="trace.last.outputs")
# Run and debugresult = await scenario("debug").add_spec(spied_interaction).run()Usage Patterns
Section titled “Usage Patterns”Replaying Recorded Conversations
Section titled “Replaying Recorded Conversations”from giskard.checks import TestCase, Trace, Interaction, from_fn
# Load recorded conversationrecorded_interactions = [ Interaction( inputs="What's the weather?", outputs="It's sunny and 75°F.", metadata={"timestamp": "2024-01-01T10:00:00"} ), Interaction( inputs="Should I bring an umbrella?", outputs="No, you won't need one today.", metadata={"timestamp": "2024-01-01T10:00:15"} ),]
trace = Trace(interactions=recorded_interactions)
# Test against recorded conversationtest_case = TestCase( name="weather_conversation_replay", trace=trace, checks=[ from_fn( lambda t: "sunny" in t.interactions[0].outputs.lower(), name="mentions_weather" ), from_fn( lambda t: "no" in t.interactions[1].outputs.lower(), name="umbrella_not_needed" ), ])
await test_case.assert_passed()Batch Testing
Section titled “Batch Testing”# Run multiple test casestest_cases = [ TestCase(name="test1", trace=trace1, checks=checks1), TestCase(name="test2", trace=trace2, checks=checks2), TestCase(name="test3", trace=trace3, checks=checks3),]
# Run all in parallelresults = await asyncio.gather(*[tc.run() for tc in test_cases])
# Summarypassed = sum(1 for r in results if r.passed)failed = sum(1 for r in results if r.failed)print(f"Passed: {passed}/{len(results)}")print(f"Failed: {failed}/{len(results)}")Integration with pytest
Section titled “Integration with pytest”import pytestfrom giskard.checks import TestCase, Trace, Interaction, Equals
@pytest.mark.asyncioasync def test_greeting_response(): """Test that greeting responses are polite.""" trace = Trace(interactions=[ Interaction(inputs="Hello", outputs="Hi there! How can I help?") ])
test_case = TestCase( name="greeting_politeness", trace=trace, checks=[ from_fn( lambda t: any(word in t.last.outputs.lower() for word in ["hi", "hello", "hey"]), name="has_greeting" ), from_fn( lambda t: "help" in t.last.outputs.lower(), name="offers_help" ), ] )
await test_case.assert_passed()
@pytest.mark.asyncioasync def test_error_handling(): """Test that errors are handled gracefully.""" trace = Trace(interactions=[ Interaction( inputs="invalid command", outputs="I don't understand that command.", metadata={"error_handled": True} ) ])
test_case = TestCase( name="error_handling", trace=trace, checks=[ Equals(expected=True, key="trace.last.metadata.error_handled"), from_fn( lambda t: "don't understand" in t.last.outputs.lower(), name="error_message_present" ), ] )
await test_case.assert_passed()Parameterized Tests
Section titled “Parameterized Tests”import pytestfrom giskard.checks import TestCase, Trace, Interaction, Equals
# Test datatest_data = [ ("Hello", "Hi there!"), ("Good morning", "Good morning!"), ("Hey", "Hey! How can I help?"),]
@pytest.mark.asyncio@pytest.mark.parametrize("greeting_input,expected_output", test_data)async def test_greeting_variations(greeting_input, expected_output): """Test various greeting inputs.""" trace = Trace(interactions=[ Interaction(inputs=greeting_input, outputs=expected_output) ])
test_case = TestCase( name=f"greeting_{greeting_input}", trace=trace, checks=[ Equals(expected=expected_output, key="trace.last.outputs") ] )
await test_case.assert_passed()See Also
Section titled “See Also”- Core API - Trace, Interaction, and Check details
- Scenarios - Building multi-step test workflows
- Built-in Checks - Ready-to-use validation checks
- Testing Guide - Comprehensive testing patterns and best practices