Quickstart

This guide will walk you through creating your first scenario with Giskard Checks in under 5 minutes.

A simple example

Let’s consider a simple question-answering bot. We want to test that the answers of our bot are correct according to some context information.

In the checks framework, you test a Trace. A Trace is an immutable record of everything exchanged with the system under test (SUT). It contains one or more Interactions, where each Interaction corresponds to a single turn (inputs + outputs).

For our simple Q&A bot, we can represent a single turn as a trace with just one interaction. The inputs and outputs can be anything the bot supports, as long as they are serializable to JSON. For now, we’ll assume our bot takes an input string (question) and returns a string (the answer).

from giskard.checks import scenario, Groundedness

# Use the fluent builder to create a scenario with an interaction and checks
test_scenario = (
    scenario("test_france_capital")
    .interact(
        inputs="What is the capital of France?",
        outputs="The capital of France is Paris."  # generated by the bot
    )
    .check(
        Groundedness(
            name="answer is grounded",
            answer_key="trace.last.outputs",
            context="""France is a country in Western Europe. Its capital
                       and largest city is Paris, known for the Eiffel Tower
                       and the Louvre Museum."""
        )
    )
)

In practice, we’ll get the outputs directly from the bot, or maybe from a dataset of previously recorded interactions.

Note how we created the groundedness check:

name: this is an (optional) name for the check, to make it easier to interpret the results
answer_key: this is the key (in JSONPath) to the answer in the trace. All JSONPath keys must start with trace. The last property is a shortcut for interactions[-1] and can be used in both JSONPath keys and Python code. In this case we want to check the outputs attribute of the last interaction in the trace (this is the default)
context: this is the context information that will be used to check if the answer is grounded. Note that a context_key is also available if we want to dynamically load the context from the trace itself (see next example).

We can now run the scenario and inspect the results. In a notebook, the ScenarioResult renders with a rich display:

result = await test_scenario.run()
result

Rich display for a ScenarioResult with trace and check results

The run() method is asynchronous. In a script, wrap it with asyncio.run():

import asyncio

async def main():
    result = await test_scenario.run()
    print(result)

asyncio.run(main())

If you’re already inside an async function (like in pytest with @pytest.mark.asyncio), you can call await test_scenario.run() directly.

Dynamic interactions

So far, we’ve used static values for inputs and outputs. In practice, you’ll often want to generate outputs dynamically by calling your SUT, or generate inputs based on previous interactions.

You can pass callables (functions or lambdas) to interact() instead of static values:

from openai import OpenAI
from giskard.checks import scenario, Groundedness

client = OpenAI()

def get_answer(inputs: str) -> str:
    response = client.chat.completions.create(
        model="gpt-5-mini",
        messages=[{"role": "user", "content": inputs}],
    )
    return response.choices[0].message.content

test_scenario = (
    scenario("test_dynamic_output")
    .interact(
        inputs="What is the capital of France?",
        outputs=get_answer
    )
    .check(
        Groundedness(
            name="answer is grounded",
            answer_key="trace.last.outputs",
            context="France is a country in Western Europe..."
        )
    )
)

No need to precompute outputs anymore. This is especially useful in multi-turn scenarios, where inputs or outputs depend on earlier interactions (see Multi-Turn Scenarios).

Structuring the interactions

As mentioned above, in practice the interaction inputs and outputs can take any form as long as they are serializable to JSON. For example, our bot could take input in the form of an OpenAI message object and return a structured output like this:

{
    "answer": "The capital of France is Paris.",
    "confidence": 0.93,
    "documents": [
         "France is a country in Western Europe. Its capital and largest city is Paris, known for the Eiffel Tower and the Louvre Museum.",
         "The Eiffel Tower is a wrought-iron lattice tower in Paris. It was completed in 1889."
     ]
}

We can easily create a trace based on this format, and adapt our scenario:

from giskard.checks import scenario, GreaterThan, Groundedness

test_scenario = (
    scenario("test_structured_output")
    .interact(
        inputs={"role": "user", "content": "What is the capital of France?"},
        outputs={
            "answer": "The capital of France is Paris.",
            "confidence": 0.93,
            "documents": ["France is a country in Western Europe. Its capital and largest city is Paris, known for the Eiffel Tower and the Louvre Museum.", "The Eiffel Tower is a wrought-iron lattice tower in Paris. It was completed in 1889."]
        }
    )
    .check(
        Groundedness(
            name="answer is grounded",
            answer_key="trace.last.outputs.answer",
            context_key="trace.last.outputs.documents",
        )
    )
    .check(
        GreaterThan(
            name="confidence is high",
            key="trace.last.outputs.confidence",
            expected_value=0.90,
        )
    )
)

Note how this time we used context_key to obtain the context from the documents present in the trace itself. This is a common case for RAG systems. We also added a check to ensure the confidence is high.

We can now run the scenario and inspect the results. In a notebook, the ScenarioResult renders with a rich display:

result = await test_scenario.run()
result

Rich display for a structured interaction scenario result

This will give us a result object with the results of the checks.

Check out the Multi-Turn Scenarios guide for more details on how to test multi-turn scenarios.