Core API
Base classes and fundamental types for building checks and scenarios.
Base class for all checks. Subclass and register to create custom validation logic.
Module: giskard.checks.core.check
Description
Section titled “Description”The Check class provides the foundation for all validation logic in Giskard. Create custom checks by subclassing and registering with a unique kind identifier using the @Check.register("kind") decorator.
Attributes
Section titled “Attributes”| Attribute | Type | Default | Description |
|---|---|---|---|
name | str | None | None | Optional check name for reporting |
description | str | None | None | Human-readable description of what the check validates |
Key Methods
Section titled “Key Methods”run(trace: Trace) -> CheckResult
Section titled “run(trace: Trace) -> CheckResult”Execute the check logic against the provided trace.
Parameters:
trace(Trace): The trace containing interaction history. Access the current interaction viatrace.lastortrace.interactions[-1]
Returns:
CheckResult: Success, failure, error, or skip result
Creating Custom Checks
Section titled “Creating Custom Checks”from giskard.checks import Check, CheckResult, Trace
@Check.register("my_custom_check")class MyCustomCheck(Check): threshold: float = 0.8
async def run(self, trace: Trace) -> CheckResult: # Your validation logic score = self._calculate_score(trace)
if score >= self.threshold: return CheckResult.success( message=f"Score {score} meets threshold", metrics={"score": score} ) else: return CheckResult.failure( message=f"Score {score} below threshold {self.threshold}", metrics={"score": score} )
def _calculate_score(self, trace: Trace) -> float: # Custom scoring logic return 0.85CheckResult
Section titled “CheckResult”Immutable result produced by running a check.
Module: giskard.checks.core.result
Description
Section titled “Description”CheckResult encapsulates the outcome of a check execution, including the status (pass/fail/error/skip), optional message, metrics, and additional details.
Attributes
Section titled “Attributes”| Attribute | Type | Description |
|---|---|---|
status | CheckStatus | Outcome status (PASS, FAIL, ERROR, SKIP) |
message | str | None | Optional short message to surface to users |
metrics | list[Metric] | List of auxiliary metrics captured by the check |
details | dict[str, Any] | Arbitrary structured payload with additional context |
Creating Results
Section titled “Creating Results”Use the static factory methods to create results:
from giskard.checks import CheckResult
# Success - check passedresult = CheckResult.success( message="All validations passed", details={"score": 0.95, "threshold": 0.8})
# Failure - check did not passresult = CheckResult.failure( message="Score below threshold", details={"score": 0.65, "threshold": 0.8, "reason": "low confidence"})
# Error - unexpected exception or conditionresult = CheckResult.error( message="Failed to connect to API", details={"error": "Connection timeout"})
# Skip - precondition not metresult = CheckResult.skip( message="Skipped: No outputs available", details={"reason": "empty_response"})Instance Properties
Section titled “Instance Properties”| Property | Type | Description |
|---|---|---|
passed | bool | True if status is PASS |
failed | bool | True if status is FAIL |
errored | bool | True if status is ERROR |
skipped | bool | True if status is SKIP |
result = CheckResult.success(message="Check passed")if result.passed: print("✓ Check succeeded")CheckStatus
Section titled “CheckStatus”Enumeration of possible check execution outcomes.
Module: giskard.checks.core.result
Values
Section titled “Values”| Status | Description |
|---|---|
PASS | Check validation succeeded |
FAIL | Check validation failed |
ERROR | Unexpected error during check execution |
SKIP | Check was skipped (e.g., precondition not met) |
from giskard.checks import CheckStatus
# Use in conditional logicif result.status == CheckStatus.PASS: print("Success!")elif result.status == CheckStatus.FAIL: print("Validation failed")Interaction
Section titled “Interaction”A single exchange between inputs and outputs.
Module: giskard.checks.core.trace
Description
Section titled “Description”An interaction represents one exchange in a conversation or workflow, capturing the inputs provided, the outputs produced, and optional metadata.
Attributes
Section titled “Attributes”| Attribute | Type | Description |
|---|---|---|
inputs | InputType | Input values for this interaction (e.g., user message, API request) |
outputs | OutputType | Output values produced in response (e.g., assistant reply, API response) |
metadata | dict[str, Any] | Optional metadata (timing, tool calls, intermediate states, etc.) |
Examples
Section titled “Examples”from giskard.checks import Interaction
# Simple text interactioninteraction = Interaction( inputs="What is the capital of France?", outputs="The capital of France is Paris.", metadata={"model": "gpt-4", "tokens": 15})
# Structured interactioninteraction = Interaction( inputs={"query": "weather", "location": "Paris"}, outputs={"temperature": 20, "conditions": "sunny"}, metadata={"api": "weather_service", "latency_ms": 120})Immutable history of all interactions in a scenario.
Module: giskard.checks.core.trace
Description
Section titled “Description”A trace accumulates all interactions that have occurred during scenario execution. It is passed to checks for validation and to interaction specs for generating subsequent interactions.
The trace is immutable (frozen), ensuring that checks and specs cannot accidentally modify the history.
Attributes
Section titled “Attributes”| Attribute | Type | Description |
|---|---|---|
interactions | list[Interaction] | Ordered list of all interactions. Most recent at [-1] |
last | Interaction | None | Computed property returning the last interaction, or None if empty |
Accessing Trace Data
Section titled “Accessing Trace Data”from giskard.checks import scenario
# Create a scenario with multiple interactionstest_scenario = ( scenario("example_trace") .interact(inputs="Hello", outputs="Hi!") .interact(inputs="How are you?", outputs="I'm well!"))
# Access trace in checks@Check.register("trace_check")class TraceCheck(Check): async def run(self, trace: Trace) -> CheckResult: # Access last interaction last_interaction = trace.last
# Access all interactions all_interactions = trace.interactions
# Count interactions count = len(trace.interactions)
return CheckResult.success( message=f"Processed {count} interactions" )InteractionSpec
Section titled “InteractionSpec”Declarative specification for generating interactions.
Module: giskard.checks.core.interaction
Description
Section titled “Description”InteractionSpec defines how to generate interactions in a scenario. It can use static values or callables that compute values based on the current trace.
Creating Interaction Specs
Section titled “Creating Interaction Specs”from giskard.checks import scenario
# Static valuestest_case = ( scenario("static_example") .interact( inputs="test input", outputs="test output" ))
# Callable outputs - dynamic generationtest_case = ( scenario("dynamic_example") .interact( inputs="test query", outputs=lambda inputs: my_model(inputs) ))
# Access trace contexttest_case = ( scenario("context_example") .interact( inputs=lambda trace: f"Previous: {trace.last.outputs if trace.last else 'None'}", outputs=lambda inputs: generate_response(inputs) ))Scenario
Section titled “Scenario”Ordered sequence of interaction specs and checks with shared trace.
Module: giskard.checks.core.scenario
Description
Section titled “Description”A Scenario represents a multi-step test workflow combining interactions and checks. Each interaction is executed in order, building up the trace, and checks can be inserted at any point to validate the state.
Building Scenarios
Section titled “Building Scenarios”from giskard.checks import scenario, from_fn
# Create a multi-step scenariotest_scenario = ( scenario("multi_step_flow") # First interaction .interact(inputs="Hello", outputs="Hi there!") # Check after first interaction .check(from_fn(lambda trace: "Hi" in trace.last.outputs, name="greeting_check")) # Second interaction .interact(inputs="What's the weather?", outputs="It's sunny!") # Final validation .check(from_fn(lambda trace: len(trace.interactions) == 2, name="count_check")))
# Run the scenarioresult = await test_scenario.run()print(f"Status: {result.status}")print(f"Interactions: {len(result.trace.interactions)}")Methods
Section titled “Methods”interact(inputs, outputs, metadata=None)
Section titled “interact(inputs, outputs, metadata=None)”Add an interaction spec to the scenario.
Parameters:
inputs: Static value or callable(trace) -> valueoutputs: Static value or callable(inputs) -> valueor(trace, inputs) -> valuemetadata: Optional metadata dict
Returns: The scenario (for chaining)
check(check: Check)
Section titled “check(check: Check)”Add a check to the scenario.
Parameters:
check: A Check instance to validate the trace at this point
Returns: The scenario (for chaining)
run() -> ScenarioResult
Section titled “run() -> ScenarioResult”Execute the scenario and return results.
Returns: ScenarioResult with status, trace, and check results
TestCase
Section titled “TestCase”Container for running multiple test scenarios.
Module: giskard.checks.core.testcase
Description
Section titled “Description”A TestCase groups multiple scenarios for batch execution. Each scenario runs independently with its own trace.
Using Scenarios (Recommended)
Section titled “Using Scenarios (Recommended)”from giskard.checks import scenario, from_fn
# Create individual scenarios (recommended approach)test_scenario = ( scenario("my_test") .interact(inputs="test", outputs="result") .check(from_fn(lambda trace: True, name="check1")) .check(from_fn(lambda trace: True, name="check2")))
# Run the scenarioresult = await test_scenario.run()Extractors
Section titled “Extractors”Base classes for extracting values from traces.
Module: giskard.checks.core.extraction
Extractor
Section titled “Extractor”Base class for extracting values from traces.
from giskard.checks.core.extraction import Extractor
class CustomExtractor(Extractor): def extract(self, trace: Trace) -> Any: # Custom extraction logic return trace.last.outputs if trace.last else NoneJsonPathExtractor
Section titled “JsonPathExtractor”Extract values using JSONPath expressions.
Module: giskard.checks.core.extraction
from giskard.checks import JsonPathExtractor
# Extract from trace using JSONPathextractor = JsonPathExtractor(key="trace.last.outputs.answer")value = extractor.extract(trace)
# Extract nested valuesextractor = JsonPathExtractor(key="trace.interactions[0].metadata.model")model_name = extractor.extract(trace)Common JSONPath Patterns:
| Pattern | Description |
|---|---|
trace.last.inputs | Last interaction inputs |
trace.last.outputs | Last interaction outputs |
trace.last.metadata.key | Metadata from last interaction |
trace.interactions[0] | First interaction |
trace.interactions[-1] | Last interaction (same as trace.last) |
Configuration
Section titled “Configuration”Global configuration functions for the checks system.
Module: giskard.checks
set_default_generator
Section titled “set_default_generator”Set the default LLM generator for LLM-based checks.
from giskard.agents.generators import Generatorfrom giskard.checks import set_default_generator
# Configure default generatorset_default_generator(Generator(model="openai/gpt-4"))
# Now LLM checks will use this generator by defaultfrom giskard.checks import Groundedness
check = Groundedness() # Uses the default generatorParameters:
generator(Generator): The generator instance to use as default
get_default_generator
Section titled “get_default_generator”Get the currently configured default generator.
from giskard.checks import get_default_generator
generator = get_default_generator()print(f"Using generator: {generator.model}")Returns:
Generator: The current default generator instance
ScenarioRunner
Section titled “ScenarioRunner”Low-level runner for executing scenarios.
Module: giskard.checks.core.scenario
Description
Section titled “Description”ScenarioRunner provides the execution engine for running scenarios. Most users should use the higher-level scenario().run() API rather than using ScenarioRunner directly.
See Also
Section titled “See Also”- Built-in Checks - Ready-to-use validation checks
- Scenarios - Multi-step workflow testing
- Testing Utilities - Testing utilities and helpers