Skip to content
GitHubDiscord

Core API

Base classes and fundamental types for building checks and scenarios.


Base class for all checks. Subclass and register to create custom validation logic.

Module: giskard.checks.core.check

The Check class provides the foundation for all validation logic in Giskard. Create custom checks by subclassing and registering with a unique kind identifier using the @Check.register("kind") decorator.

AttributeTypeDefaultDescription
namestr | NoneNoneOptional check name for reporting
descriptionstr | NoneNoneHuman-readable description of what the check validates

Execute the check logic against the provided trace.

Parameters:

  • trace (Trace): The trace containing interaction history. Access the current interaction via trace.last or trace.interactions[-1]

Returns:

  • CheckResult: Success, failure, error, or skip result
from giskard.checks import Check, CheckResult, Trace
@Check.register("my_custom_check")
class MyCustomCheck(Check):
threshold: float = 0.8
async def run(self, trace: Trace) -> CheckResult:
# Your validation logic
score = self._calculate_score(trace)
if score >= self.threshold:
return CheckResult.success(
message=f"Score {score} meets threshold",
metrics={"score": score}
)
else:
return CheckResult.failure(
message=f"Score {score} below threshold {self.threshold}",
metrics={"score": score}
)
def _calculate_score(self, trace: Trace) -> float:
# Custom scoring logic
return 0.85

Immutable result produced by running a check.

Module: giskard.checks.core.result

CheckResult encapsulates the outcome of a check execution, including the status (pass/fail/error/skip), optional message, metrics, and additional details.

AttributeTypeDescription
statusCheckStatusOutcome status (PASS, FAIL, ERROR, SKIP)
messagestr | NoneOptional short message to surface to users
metricslist[Metric]List of auxiliary metrics captured by the check
detailsdict[str, Any]Arbitrary structured payload with additional context

Use the static factory methods to create results:

from giskard.checks import CheckResult
# Success - check passed
result = CheckResult.success(
message="All validations passed",
details={"score": 0.95, "threshold": 0.8}
)
# Failure - check did not pass
result = CheckResult.failure(
message="Score below threshold",
details={"score": 0.65, "threshold": 0.8, "reason": "low confidence"}
)
# Error - unexpected exception or condition
result = CheckResult.error(
message="Failed to connect to API",
details={"error": "Connection timeout"}
)
# Skip - precondition not met
result = CheckResult.skip(
message="Skipped: No outputs available",
details={"reason": "empty_response"}
)
PropertyTypeDescription
passedboolTrue if status is PASS
failedboolTrue if status is FAIL
erroredboolTrue if status is ERROR
skippedboolTrue if status is SKIP
result = CheckResult.success(message="Check passed")
if result.passed:
print("✓ Check succeeded")

Enumeration of possible check execution outcomes.

Module: giskard.checks.core.result

StatusDescription
PASSCheck validation succeeded
FAILCheck validation failed
ERRORUnexpected error during check execution
SKIPCheck was skipped (e.g., precondition not met)
from giskard.checks import CheckStatus
# Use in conditional logic
if result.status == CheckStatus.PASS:
print("Success!")
elif result.status == CheckStatus.FAIL:
print("Validation failed")

A single exchange between inputs and outputs.

Module: giskard.checks.core.trace

An interaction represents one exchange in a conversation or workflow, capturing the inputs provided, the outputs produced, and optional metadata.

AttributeTypeDescription
inputsInputTypeInput values for this interaction (e.g., user message, API request)
outputsOutputTypeOutput values produced in response (e.g., assistant reply, API response)
metadatadict[str, Any]Optional metadata (timing, tool calls, intermediate states, etc.)
from giskard.checks import Interaction
# Simple text interaction
interaction = Interaction(
inputs="What is the capital of France?",
outputs="The capital of France is Paris.",
metadata={"model": "gpt-4", "tokens": 15}
)
# Structured interaction
interaction = Interaction(
inputs={"query": "weather", "location": "Paris"},
outputs={"temperature": 20, "conditions": "sunny"},
metadata={"api": "weather_service", "latency_ms": 120}
)

Immutable history of all interactions in a scenario.

Module: giskard.checks.core.trace

A trace accumulates all interactions that have occurred during scenario execution. It is passed to checks for validation and to interaction specs for generating subsequent interactions.

The trace is immutable (frozen), ensuring that checks and specs cannot accidentally modify the history.

AttributeTypeDescription
interactionslist[Interaction]Ordered list of all interactions. Most recent at [-1]
lastInteraction | NoneComputed property returning the last interaction, or None if empty
from giskard.checks import scenario
# Create a scenario with multiple interactions
test_scenario = (
scenario("example_trace")
.interact(inputs="Hello", outputs="Hi!")
.interact(inputs="How are you?", outputs="I'm well!")
)
# Access trace in checks
@Check.register("trace_check")
class TraceCheck(Check):
async def run(self, trace: Trace) -> CheckResult:
# Access last interaction
last_interaction = trace.last
# Access all interactions
all_interactions = trace.interactions
# Count interactions
count = len(trace.interactions)
return CheckResult.success(
message=f"Processed {count} interactions"
)

Declarative specification for generating interactions.

Module: giskard.checks.core.interaction

InteractionSpec defines how to generate interactions in a scenario. It can use static values or callables that compute values based on the current trace.

from giskard.checks import scenario
# Static values
test_case = (
scenario("static_example")
.interact(
inputs="test input",
outputs="test output"
)
)
# Callable outputs - dynamic generation
test_case = (
scenario("dynamic_example")
.interact(
inputs="test query",
outputs=lambda inputs: my_model(inputs)
)
)
# Access trace context
test_case = (
scenario("context_example")
.interact(
inputs=lambda trace: f"Previous: {trace.last.outputs if trace.last else 'None'}",
outputs=lambda inputs: generate_response(inputs)
)
)

Ordered sequence of interaction specs and checks with shared trace.

Module: giskard.checks.core.scenario

A Scenario represents a multi-step test workflow combining interactions and checks. Each interaction is executed in order, building up the trace, and checks can be inserted at any point to validate the state.

from giskard.checks import scenario, from_fn
# Create a multi-step scenario
test_scenario = (
scenario("multi_step_flow")
# First interaction
.interact(inputs="Hello", outputs="Hi there!")
# Check after first interaction
.check(from_fn(lambda trace: "Hi" in trace.last.outputs, name="greeting_check"))
# Second interaction
.interact(inputs="What's the weather?", outputs="It's sunny!")
# Final validation
.check(from_fn(lambda trace: len(trace.interactions) == 2, name="count_check"))
)
# Run the scenario
result = await test_scenario.run()
print(f"Status: {result.status}")
print(f"Interactions: {len(result.trace.interactions)}")

Add an interaction spec to the scenario.

Parameters:

  • inputs: Static value or callable (trace) -> value
  • outputs: Static value or callable (inputs) -> value or (trace, inputs) -> value
  • metadata: Optional metadata dict

Returns: The scenario (for chaining)

Add a check to the scenario.

Parameters:

  • check: A Check instance to validate the trace at this point

Returns: The scenario (for chaining)

Execute the scenario and return results.

Returns: ScenarioResult with status, trace, and check results


Container for running multiple test scenarios.

Module: giskard.checks.core.testcase

A TestCase groups multiple scenarios for batch execution. Each scenario runs independently with its own trace.

from giskard.checks import scenario, from_fn
# Create individual scenarios (recommended approach)
test_scenario = (
scenario("my_test")
.interact(inputs="test", outputs="result")
.check(from_fn(lambda trace: True, name="check1"))
.check(from_fn(lambda trace: True, name="check2"))
)
# Run the scenario
result = await test_scenario.run()

Base classes for extracting values from traces.

Module: giskard.checks.core.extraction

Base class for extracting values from traces.

from giskard.checks.core.extraction import Extractor
class CustomExtractor(Extractor):
def extract(self, trace: Trace) -> Any:
# Custom extraction logic
return trace.last.outputs if trace.last else None

Extract values using JSONPath expressions.

Module: giskard.checks.core.extraction

from giskard.checks import JsonPathExtractor
# Extract from trace using JSONPath
extractor = JsonPathExtractor(key="trace.last.outputs.answer")
value = extractor.extract(trace)
# Extract nested values
extractor = JsonPathExtractor(key="trace.interactions[0].metadata.model")
model_name = extractor.extract(trace)

Common JSONPath Patterns:

PatternDescription
trace.last.inputsLast interaction inputs
trace.last.outputsLast interaction outputs
trace.last.metadata.keyMetadata from last interaction
trace.interactions[0]First interaction
trace.interactions[-1]Last interaction (same as trace.last)

Global configuration functions for the checks system.

Module: giskard.checks

Set the default LLM generator for LLM-based checks.

from giskard.agents.generators import Generator
from giskard.checks import set_default_generator
# Configure default generator
set_default_generator(Generator(model="openai/gpt-4"))
# Now LLM checks will use this generator by default
from giskard.checks import Groundedness
check = Groundedness() # Uses the default generator

Parameters:

  • generator (Generator): The generator instance to use as default

Get the currently configured default generator.

from giskard.checks import get_default_generator
generator = get_default_generator()
print(f"Using generator: {generator.model}")

Returns:

  • Generator: The current default generator instance

Low-level runner for executing scenarios.

Module: giskard.checks.core.scenario

ScenarioRunner provides the execution engine for running scenarios. Most users should use the higher-level scenario().run() API rather than using ScenarioRunner directly.