Core Concepts
Understanding the key concepts in Giskard Checks will help you write effective tests for your AI applications.
Introduction
Section titled “Introduction”Giskard Checks is built around a few core primitives that work together:
- Interaction: A single turn of data exchange (inputs and outputs)
- InteractionSpec: A specification for generating interactions dynamically
- Trace: An immutable snapshot of all interactions in a scenario
- Check: A validation that runs on a trace and returns a result
- Scenario: A list of steps (interactions and checks) executed sequentially
At runtime, the flow looks like this:
- A Scenario is created with a sequence of steps.
- For each step in order:
- Each InteractionSpec is resolved into a concrete Interaction.
- The Interaction is appended to the Trace.
- Checks run against the current Trace.
- Results are returned as a ScenarioResult.
Interaction
Section titled “Interaction”An Interaction represents a single turn of data exchange with the
system under test. Interactions are computed at execution time by
resolving InteractionSpec objects into the trace.
Properties:
inputs: The input to your system (string, dict, Pydantic model, etc.)outputs: The output from your system (any serializable type)metadata: Optional dictionary for additional context (timings, model info, etc.)
Interactions are immutable, as they represent something that has already happened.
InteractionSpec
Section titled “InteractionSpec”An InteractionSpec describes how to generate an interaction and is
used to describe a scenario. When you call .interact(...) in the
fluent API, it adds an InteractionSpec to the scenario sequence.
Inputs and outputs can be static values or dynamic callables, and you
can mix both.
from giskard.checks import InteractionSpecfrom openai import OpenAIimport random
def generate_random_question() -> str: return f"What is 2 + {random.randint(0, 10)}?"
def generate_answer(inputs: str) -> str: client = OpenAI() response = client.chat.completions.create( model="gpt-5-mini", messages=[{"role": "user", "content": inputs}], ) return response.choices[0].message.content
spec = InteractionSpec( inputs=generate_random_question, outputs=generate_answer, metadata={ "category": "math", "difficulty": "easy" })Specs are resolved into interactions during scenario execution. This is
common in multi-turn scenarios, where inputs and outputs are generated
based on previous interactions. See multi-turn for practical examples.
A Trace is an immutable snapshot of all data exchanged with the system
under test. In its simplest form, it is a list of interactions.
from giskard.checks import Trace, Interaction
trace = Trace(interactions=[ Interaction(inputs="Hello", outputs="Hi there!"), Interaction(inputs="How are you?", outputs="I'm doing well, thanks!")])Traces are typically created during scenario execution by resolving each
InteractionSpec into a frozen interaction.
Checks
Section titled “Checks”A Check validates something about a trace and returns a CheckResult.
There’s a library of built-in checks, but you can also create your own.
When referencing values in a trace, use JSONPath expressions that start
with trace.. The last property is a shortcut for interactions[-1]
and can be used in both JSONPath keys and Python code.
from giskard.checks import Groundedness, Trace
check = Groundedness( answer_key="trace.last.outputs", context="Giskard Checks is a testing framework for AI systems.")Scenario
Section titled “Scenario”A Scenario is a list of steps (interactions and checks) that are
executed sequentially with a shared trace. Scenarios work for both
single-turn and multi-turn tests.
from giskard.checks import scenario
test_scenario = ( scenario("test_with_checks") .interact(inputs="test input", outputs="test output") .check(check1) .check(check2))
result = await test_scenario.run()This will give us a result object with the results of the checks.
Fluent API Mapping
Section titled “Fluent API Mapping”The fluent API is the preferred user-facing entry point and maps directly to the core primitives above:
scenario(name)creates aScenariobuilder..interact(...)adds anInteractionSpecto the scenario sequence..check(...)adds aCheckto the scenario sequence..run()resolves specs to interactions, builds theTrace, runs checks, and returns aScenarioResult.
For example, we can test a simple conversation flow with two turns:
from giskard.checks import scenario, Conformity
test_scenario = ( scenario("conversation_flow") .interact(inputs="Hello", outputs=generate_answer) .check(Conformity(key="trace.last.outputs", rule="response should be a friendly greeting")) .interact(inputs="Who invented the HTML?", outputs=generate_answer) .check(Conformity(key="trace.last.outputs", rule="response should mention Tim Berners-Lee as the inventor of HTML")))
# Run with asyncio.run() if in a scriptimport asyncioresult = await test_scenario.run() # or: result = asyncio.run(test_scenario.run())For a practical introduction to the fluent API, see Quickstart.