Checks

Ready-to-use validation checks for common testing scenarios, including function-based checks, string matching, comparisons, and LLM-powered semantic validation.

Function-based Checks

`from_fn`

Create a check from a callable function.

Module: giskard.checks.builtin.fn

from giskard.checks import from_fn

# Simple boolean check
check = from_fn(
    lambda trace: trace.last.outputs is not None,
    name="has_output",
    success_message="Output was provided",
    failure_message="No output found"
)

# Check with custom logic
check = from_fn(
    lambda trace: len(trace.last.outputs) > 10,
    name="min_length_check",
    description="Validates minimum output length"
)

# Async check
async def validate_response(trace):
    response = trace.last.outputs
    # Perform async validation
    is_valid = await external_validator(response)
    return is_valid

check = from_fn(validate_response, name="async_validation")

Parameters:

Parameter	Type	Default	Description
`fn`	`Callable`	required	Function taking trace, returning bool or CheckResult
`name`	`str \| None`	`None`	Optional check name
`description`	`str \| None`	`None`	Optional description
`success_message`	`str \| None`	`None`	Message when check passes
`failure_message`	`str \| None`	`None`	Message when check fails
`details`	`dict \| None`	`None`	Additional details to include in result

Returns:

FnCheck: A check instance wrapping the function

`FnCheck`

A Check whose logic is implemented as a Python callable.

Module: giskard.checks.builtin.fn

from giskard.checks.builtin.fn import FnCheck

# Create directly
check = FnCheck(
    fn=lambda trace: "error" not in trace.last.outputs.lower(),
    name="no_errors",
    success_message="No errors detected",
    failure_message="Error found in output"
)

String Matching

`StringMatching`

Check that validates string patterns in trace values.

Module: giskard.checks.builtin.string_matching

from giskard.checks import StringMatching

# Check if output contains specific text
check = StringMatching(
    keyword="success",
    text_key="trace.last.outputs",
    match_type="contains"
)

# Check with regex pattern
check = StringMatching(
    keyword=r"\d{3}-\d{3}-\d{4}",  # Phone number pattern
    text_key="trace.last.outputs.phone",
    match_type="regex"
)

# Case-insensitive matching
check = StringMatching(
    keyword="error",
    text_key="trace.last.outputs",
    match_type="contains",
    case_sensitive=False
)

Parameters:

Parameter	Type	Default	Description
`keyword`	`str`	required	Pattern to match (string or regex)
`text_key`	`str`	required	JSONPath to extract value from trace
`match_type`	`str`	`"contains"`	Match type: “contains”, “equals”, “startswith”, “endswith”, “regex”
`case_sensitive`	`bool`	`True`	Whether matching is case-sensitive

Match Types:

Type	Description	Example
`contains`	Pattern appears anywhere in text	”hello” in “hello world”
`equals`	Exact match	”hello” == “hello”
`startswith`	Text starts with pattern	”hello world”.startswith(“hello”)
`endswith`	Text ends with pattern	”hello world”.endswith(“world”)
`regex`	Regular expression match	re.search(r”\d+”, “test123”)

Comparison Checks

Validate numeric and comparable values against expected thresholds.

`Equals`

Check that extracted values equal an expected value.

Module: giskard.checks.builtin.comparison

from giskard.checks import Equals

# Check exact value
check = Equals(
    expected_value=42,
    key="trace.last.outputs.count"
)

# Check string equality
check = Equals(
    expected_value="success",
    key="trace.last.outputs.status"
)

# Compare against value from trace
check = Equals(
    expected_value_key="trace.interactions[0].outputs.baseline",
    key="trace.last.outputs.result"
)

Parameters:

Parameter	Type	Default	Description
`expected_value`	`Any \| None`	`None`	Static expected value
`expected_value_key`	`str \| None`	`None`	JSONPath to extract expected value from trace
`key`	`str`	required	JSONPath to extract actual value
`normalization_form`	`str \| None`	`None`	Unicode normalization: “NFC”, “NFD”, “NFKC”, “NFKD”

`NotEquals`

Check that extracted values do not equal an expected value.

from giskard.checks import NotEquals

check = NotEquals(
    expected_value="error",
    key="trace.last.outputs.status"
)

`GreaterThan`

Check that extracted values are greater than an expected value.

from giskard.checks import GreaterThan

check = GreaterThan(
    expected_value=0.8,
    key="trace.last.metadata.confidence_score"
)

`GreaterEquals`

Check that extracted values are greater than or equal to an expected value.

from giskard.checks import GreaterEquals

check = GreaterEquals(
    expected_value=100,
    key="trace.last.outputs.user_count"
)

`LesserThan`

Check that extracted values are less than an expected value.

from giskard.checks import LesserThan

check = LesserThan(
    expected_value=500,
    key="trace.last.metadata.latency_ms"
)

`LesserThanEquals`

Check that extracted values are less than or equal to an expected value.

from giskard.checks import LesserThanEquals

check = LesserThanEquals(
    expected_value=1000,
    key="trace.last.metadata.token_count"
)

LLM-based Checks

Validation checks powered by Large Language Models for semantic understanding.

`BaseLLMCheck`

Abstract base class for creating custom LLM-powered checks.

Module: giskard.checks.judges.base

Description

BaseLLMCheck provides a framework for building checks that leverage Large Language Models for evaluation. It handles the LLM interaction, prompt rendering, and result parsing, so subclasses only need to define the evaluation prompt.

Attributes

Attribute	Type	Default	Description
`generator`	`BaseGenerator \| None`	`None`	LLM generator for evaluation. Falls back to the global default if not provided
`name`	`str \| None`	`None`	Optional check name
`description`	`str \| None`	`None`	Optional description

Key Methods

`get_prompt() -> str | Message | MessageTemplate | TemplateReference`

Returns the prompt to send to the LLM. Subclasses must implement this method.

Returns:

Can be a string (automatically converted to MessageTemplate)
A Message object
A MessageTemplate with Jinja2 templating
A TemplateReference pointing to a template file

`get_inputs(trace: Trace) -> dict[str, Any]`

Provides template variables for prompt rendering. Override to customize available variables.

Parameters:

trace (Trace): The trace containing interaction history

Returns:

dict[str, Any]: Template variables (default: {"trace": trace})

`run(trace: Trace) -> CheckResult`

Executes the LLM-based check (inherited, usually doesn’t need to be overridden).

Creating Custom LLM Checks

from giskard.checks.judges.base import BaseLLMCheck
from giskard.agents.generators import Generator

@BaseLLMCheck.register("custom_llm_check")
class CustomLLMCheck(BaseLLMCheck):
    custom_instruction: str

    def get_prompt(self):
        return f"""
        Evaluate the interaction based on: {self.custom_instruction}

        Input: {{{{ trace.last.inputs }}}}
        Output: {{{{ trace.last.outputs }}}}

        Return passed=true if the interaction meets the criteria,
        passed=false otherwise. Include a reason for your decision.
        """

    def get_inputs(self, trace):
        # Optionally customize template variables
        return {
            "trace": trace,
            "custom_var": "additional context"
        }

# Usage
check = CustomLLMCheck(
    custom_instruction="Response must be concise and helpful",
    generator=Generator(model="openai/gpt-4")
)

Configure a default generator to avoid passing it to every LLM check:

from giskard.agents.generators import Generator
from giskard.checks import set_default_generator

set_default_generator(Generator(model="openai/gpt-4"))

Output Format

LLM checks expect the model to return structured output with:

passed (bool): Whether the check passed
reason (str, optional): Explanation of the result

The BaseLLMCheck automatically parses this structure into a CheckResult.

`LLMCheckResult`

Default result model for LLM-based checks.

Module: giskard.checks.judges.base

Attributes

Attribute	Type	Description
`passed`	`bool`	Whether the check passed
`reason`	`str \| None`	Optional explanation for the result

This is the structured output format expected from the LLM when using BaseLLMCheck.

`Groundedness`

Validates that answers are grounded in provided context documents.

Module: giskard.checks.judges.groundedness

Description

Uses an LLM to determine if an answer is properly supported by the given context. This is crucial for RAG (Retrieval-Augmented Generation) systems to ensure responses don’t hallucinate information not present in the retrieved documents.

Attributes

Attribute	Type	Default	Description
`answer`	`str \| None`	`None`	The answer text to evaluate
`answer_key`	`str`	`"trace.last.outputs"`	JSONPath to extract answer from trace
`context`	`str \| list[str] \| None`	`None`	Context document(s) that should support the answer
`context_key`	`str`	`"trace.last.metadata.context"`	JSONPath to extract context from trace
`generator`	`BaseGenerator \| None`	`None`	LLM generator for evaluation

Usage

Static values:

from giskard.checks import Groundedness
from giskard.agents.generators import Generator

check = Groundedness(
    answer="The Eiffel Tower is in Paris.",
    context=["Paris is the capital of France.", "The Eiffel Tower is a famous landmark."],
    generator=Generator(model="openai/gpt-4")
)

Extracting from trace:

check = Groundedness(
    answer_key="trace.last.outputs.answer",
    context_key="trace.last.metadata.retrieved_docs",
    generator=Generator(model="openai/gpt-4")
)

# Run against a trace
result = await check.run(trace)

`Conformity`

Validates that interactions conform to a specified rule or requirement.

Module: giskard.checks.judges.conformity

Description

Uses an LLM to evaluate whether an interaction (inputs, outputs, and metadata) conforms to a given rule. The rule supports Jinja2 templating, allowing for dynamic rules that reference trace data.

Attributes

Attribute	Type	Description
`rule`	`str`	The conformity rule to evaluate. Supports Jinja2 templating with access to the trace object
`generator`	`BaseGenerator \| None`	LLM generator for evaluation (falls back to default)

Usage

Static rule:

from giskard.checks import Conformity
from giskard.agents.generators import Generator

check = Conformity(
    rule="The response must be professional and polite",
    generator=Generator(model="openai/gpt-4")
)

Dynamic rule with templating:

check = Conformity(
    rule="The response must contain the keywords '{{ trace.last.inputs.required_keywords }}' and be concise",
    generator=Generator(model="openai/gpt-4")
)

# The rule is rendered at runtime with access to trace data
result = await check.run(trace)

Accessing different trace elements:

# Reference inputs
check = Conformity(
    rule="Respond to the user's query: '{{ trace.last.inputs }}'"
)

# Reference metadata
check = Conformity(
    rule="Use a {{ trace.last.metadata.tone }} tone in the response"
)

# Reference earlier interactions
check = Conformity(
    rule="Build upon the previous answer: '{{ trace.interactions[-2].outputs }}'"
)

`LLMJudge`

General-purpose LLM-based validation with custom prompts.

Module: giskard.checks.judges.judge

Description

The most flexible LLM check that allows you to define completely custom evaluation logic through prompts. Use this when the specialized checks (Groundedness, Conformity) don’t fit your needs.

Attributes

Attribute	Type	Description
`prompt`	`str \| None`	Inline prompt content with Jinja2 templating support
`prompt_path`	`str \| None`	Path to a template file (e.g., `"checks::my_template.j2"`)
`generator`	`BaseGenerator \| None`	LLM generator for evaluation

Usage

Inline prompt:

from giskard.checks import LLMJudge
from giskard.agents.generators import Generator

check = LLMJudge(
    prompt="""
    Evaluate if the response is helpful and accurate.

    User Input: {{ trace.last.inputs }}
    AI Response: {{ trace.last.outputs }}

    Return passed=true if the response is helpful and accurate,
    passed=false otherwise. Provide a reason for your decision.
    """,
    generator=Generator(model="openai/gpt-4")
)

Template file:

# First, create a template file at templates/checks/safety.j2
check = LLMJudge(
    prompt_path="checks::safety.j2",
    generator=Generator(model="openai/gpt-4")
)

Complex evaluation:

check = LLMJudge(
    prompt="""
    Evaluate the multi-turn conversation quality.

    Conversation history:
    {% for interaction in trace.interactions %}
    User: {{ interaction.inputs }}
    Assistant: {{ interaction.outputs }}
    {% endfor %}

    Criteria:
    1. Consistency across turns
    2. Relevant responses
    3. Professional tone

    Return passed=true if all criteria are met, passed=false otherwise.
    Include specific reasons for any failures.
    """,
    generator=Generator(model="openai/gpt-4")
)

Template Variables

The following variables are available in prompts:

Variable	Description
`trace`	Full trace object with all interactions
`trace.interactions`	List of all interactions in order
`trace.last`	Most recent interaction (preferred)
`trace.last.inputs`	Inputs from the most recent interaction
`trace.last.outputs`	Outputs from the most recent interaction
`trace.last.metadata`	Metadata from the most recent interaction

`SemanticSimilarity`

Validate semantic similarity between outputs and expected content.

Module: giskard.checks.builtin.semantic_similarity

from giskard.checks import SemanticSimilarity
from giskard.agents.generators import Generator

check = SemanticSimilarity(
    expected="The capital of France is Paris.",
    actual_key="trace.last.outputs",
    threshold=0.8,
    generator=Generator(model="openai/gpt-4")
)

Parameters:

Parameter	Type	Default	Description
`expected`	`str`	required	Expected semantic content
`actual`	`str \| None`	`None`	Actual output to compare
`actual_key`	`str`	`"trace.last.outputs"`	JSONPath to extract actual value
`threshold`	`float`	`0.8`	Similarity threshold (0.0 to 1.0)
`generator`	`BaseGenerator \| None`	`None`	LLM generator for evaluation

Common Patterns

Combining Multiple Checks

from giskard.checks import Groundedness, Conformity, LLMJudge, Scenario

scenario = (
    Scenario()
    .interact(
        inputs="What is the capital of France?",
        outputs=lambda inputs: "Paris is the capital of France."
    )
    .check(Groundedness(
        context=["France is a country in Europe.", "Paris is the capital."]
    ))
    .check(Conformity(
        rule="The response must be a complete sentence"
    ))
    .check(LLMJudge(
        prompt="Is the response educational and informative? Return passed=true/false."
    ))
)

Reusing Generators

from giskard.agents.generators import Generator
from giskard.checks import set_default_generator

# Set once, use everywhere
generator = Generator(model="openai/gpt-4", temperature=0.1)
set_default_generator(generator)

# No need to pass generator anymore
check1 = Groundedness(answer="...", context=["..."])
check2 = Conformity(rule="...")
check3 = LLMJudge(prompt="...")

Error Handling

from giskard.checks import CheckStatus

result = await check.run(trace)

if result.status == CheckStatus.ERROR:
    print(f"Check failed with error: {result.message}")
elif result.status == CheckStatus.FAIL:
    print(f"Check failed: {result.message}")
    print(f"Details: {result.details}")
elif result.status == CheckStatus.PASS:
    print(f"Check passed: {result.message}")

Creating Custom Checks

For validation logic that doesn’t fit built-in checks, create a custom check:

from giskard.checks import Check, CheckResult, Trace

@Check.register("custom_business_logic")
class CustomBusinessCheck(Check):
    threshold: float = 0.9
    allowed_categories: list[str] = []

    async def run(self, trace: Trace) -> CheckResult:
        # Extract data
        output = trace.last.outputs
        category = output.get("category")
        confidence = output.get("confidence", 0)

        # Validate category
        if category not in self.allowed_categories:
            return CheckResult.failure(
                message=f"Invalid category: {category}",
                details={"category": category, "allowed": self.allowed_categories}
            )

        # Validate confidence
        if confidence < self.threshold:
            return CheckResult.failure(
                message=f"Confidence {confidence} below threshold {self.threshold}",
                details={"confidence": confidence, "threshold": self.threshold}
            )

        return CheckResult.success(
            message="Validation passed",
            details={"confidence": confidence, "category": category}
        )

# Use the custom check
check = CustomBusinessCheck(
    threshold=0.85,
    allowed_categories=["sports", "news", "entertainment"]
)
result = await check.run(trace)

Checks

Function-based Checks

from_fn

FnCheck

String Matching

StringMatching

Comparison Checks

Equals

NotEquals

GreaterThan

GreaterEquals

LesserThan

LesserThanEquals

LLM-based Checks

BaseLLMCheck

Description

Attributes

Key Methods

get_prompt() -> str | Message | MessageTemplate | TemplateReference

get_inputs(trace: Trace) -> dict[str, Any]

run(trace: Trace) -> CheckResult

Creating Custom LLM Checks

Output Format

LLMCheckResult

Attributes

Groundedness

Description

Attributes

Usage

Conformity

Description

Attributes

Usage

LLMJudge

Description

Attributes

Usage

Template Variables

SemanticSimilarity

Common Patterns

Combining Multiple Checks

Reusing Generators

Error Handling

Creating Custom Checks

See Also

`from_fn`

`FnCheck`

`StringMatching`

`Equals`

`NotEquals`

`GreaterThan`

`GreaterEquals`

`LesserThan`

`LesserThanEquals`

`BaseLLMCheck`

`get_prompt() -> str | Message | MessageTemplate | TemplateReference`

`get_inputs(trace: Trace) -> dict[str, Any]`

`run(trace: Trace) -> CheckResult`

`LLMCheckResult`

`Groundedness`

`Conformity`

`LLMJudge`

`SemanticSimilarity`