Checks
Ready-to-use validation checks for common testing scenarios, including function-based checks, string matching, comparisons, and LLM-powered semantic validation.
Function-based Checks
Section titled “Function-based Checks”from_fn
Section titled “from_fn”Create a check from a callable function.
Module: giskard.checks.builtin.fn
from giskard.checks import from_fn
# Simple boolean checkcheck = from_fn( lambda trace: trace.last.outputs is not None, name="has_output", success_message="Output was provided", failure_message="No output found")
# Check with custom logiccheck = from_fn( lambda trace: len(trace.last.outputs) > 10, name="min_length_check", description="Validates minimum output length")
# Async checkasync def validate_response(trace): response = trace.last.outputs # Perform async validation is_valid = await external_validator(response) return is_valid
check = from_fn(validate_response, name="async_validation")Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
fn | Callable | required | Function taking trace, returning bool or CheckResult |
name | str | None | None | Optional check name |
description | str | None | None | Optional description |
success_message | str | None | None | Message when check passes |
failure_message | str | None | None | Message when check fails |
details | dict | None | None | Additional details to include in result |
Returns:
FnCheck: A check instance wrapping the function
FnCheck
Section titled “FnCheck”A Check whose logic is implemented as a Python callable.
Module: giskard.checks.builtin.fn
from giskard.checks.builtin.fn import FnCheck
# Create directlycheck = FnCheck( fn=lambda trace: "error" not in trace.last.outputs.lower(), name="no_errors", success_message="No errors detected", failure_message="Error found in output")String Matching
Section titled “String Matching”StringMatching
Section titled “StringMatching”Check that validates string patterns in trace values.
Module: giskard.checks.builtin.string_matching
from giskard.checks import StringMatching
# Check if output contains specific textcheck = StringMatching( keyword="success", text_key="trace.last.outputs", match_type="contains")
# Check with regex patterncheck = StringMatching( keyword=r"\d{3}-\d{3}-\d{4}", # Phone number pattern text_key="trace.last.outputs.phone", match_type="regex")
# Case-insensitive matchingcheck = StringMatching( keyword="error", text_key="trace.last.outputs", match_type="contains", case_sensitive=False)Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
keyword | str | required | Pattern to match (string or regex) |
text_key | str | required | JSONPath to extract value from trace |
match_type | str | "contains" | Match type: “contains”, “equals”, “startswith”, “endswith”, “regex” |
case_sensitive | bool | True | Whether matching is case-sensitive |
Match Types:
| Type | Description | Example |
|---|---|---|
contains | Pattern appears anywhere in text | ”hello” in “hello world” |
equals | Exact match | ”hello” == “hello” |
startswith | Text starts with pattern | ”hello world”.startswith(“hello”) |
endswith | Text ends with pattern | ”hello world”.endswith(“world”) |
regex | Regular expression match | re.search(r”\d+”, “test123”) |
Comparison Checks
Section titled “Comparison Checks”Validate numeric and comparable values against expected thresholds.
Equals
Section titled “Equals”Check that extracted values equal an expected value.
Module: giskard.checks.builtin.comparison
from giskard.checks import Equals
# Check exact valuecheck = Equals( expected_value=42, key="trace.last.outputs.count")
# Check string equalitycheck = Equals( expected_value="success", key="trace.last.outputs.status")
# Compare against value from tracecheck = Equals( expected_value_key="trace.interactions[0].outputs.baseline", key="trace.last.outputs.result")Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
expected_value | Any | None | None | Static expected value |
expected_value_key | str | None | None | JSONPath to extract expected value from trace |
key | str | required | JSONPath to extract actual value |
normalization_form | str | None | None | Unicode normalization: “NFC”, “NFD”, “NFKC”, “NFKD” |
NotEquals
Section titled “NotEquals”Check that extracted values do not equal an expected value.
from giskard.checks import NotEquals
check = NotEquals( expected_value="error", key="trace.last.outputs.status")GreaterThan
Section titled “GreaterThan”Check that extracted values are greater than an expected value.
from giskard.checks import GreaterThan
check = GreaterThan( expected_value=0.8, key="trace.last.metadata.confidence_score")GreaterEquals
Section titled “GreaterEquals”Check that extracted values are greater than or equal to an expected value.
from giskard.checks import GreaterEquals
check = GreaterEquals( expected_value=100, key="trace.last.outputs.user_count")LesserThan
Section titled “LesserThan”Check that extracted values are less than an expected value.
from giskard.checks import LesserThan
check = LesserThan( expected_value=500, key="trace.last.metadata.latency_ms")LesserThanEquals
Section titled “LesserThanEquals”Check that extracted values are less than or equal to an expected value.
from giskard.checks import LesserThanEquals
check = LesserThanEquals( expected_value=1000, key="trace.last.metadata.token_count")LLM-based Checks
Section titled “LLM-based Checks”Validation checks powered by Large Language Models for semantic understanding.
BaseLLMCheck
Section titled “BaseLLMCheck”Abstract base class for creating custom LLM-powered checks.
Module: giskard.checks.judges.base
Description
Section titled “Description”BaseLLMCheck provides a framework for building checks that leverage Large Language Models for evaluation. It handles the LLM interaction, prompt rendering, and result parsing, so subclasses only need to define the evaluation prompt.
Attributes
Section titled “Attributes”| Attribute | Type | Default | Description |
|---|---|---|---|
generator | BaseGenerator | None | None | LLM generator for evaluation. Falls back to the global default if not provided |
name | str | None | None | Optional check name |
description | str | None | None | Optional description |
Key Methods
Section titled “Key Methods”get_prompt() -> str | Message | MessageTemplate | TemplateReference
Section titled “get_prompt() -> str | Message | MessageTemplate | TemplateReference”Returns the prompt to send to the LLM. Subclasses must implement this method.
Returns:
- Can be a string (automatically converted to MessageTemplate)
- A Message object
- A MessageTemplate with Jinja2 templating
- A TemplateReference pointing to a template file
get_inputs(trace: Trace) -> dict[str, Any]
Section titled “get_inputs(trace: Trace) -> dict[str, Any]”Provides template variables for prompt rendering. Override to customize available variables.
Parameters:
trace(Trace): The trace containing interaction history
Returns:
dict[str, Any]: Template variables (default:{"trace": trace})
run(trace: Trace) -> CheckResult
Section titled “run(trace: Trace) -> CheckResult”Executes the LLM-based check (inherited, usually doesn’t need to be overridden).
Creating Custom LLM Checks
Section titled “Creating Custom LLM Checks”from giskard.checks.judges.base import BaseLLMCheckfrom giskard.agents.generators import Generator
@BaseLLMCheck.register("custom_llm_check")class CustomLLMCheck(BaseLLMCheck): custom_instruction: str
def get_prompt(self): return f""" Evaluate the interaction based on: {self.custom_instruction}
Input: {{{{ trace.last.inputs }}}} Output: {{{{ trace.last.outputs }}}}
Return passed=true if the interaction meets the criteria, passed=false otherwise. Include a reason for your decision. """
def get_inputs(self, trace): # Optionally customize template variables return { "trace": trace, "custom_var": "additional context" }
# Usagecheck = CustomLLMCheck( custom_instruction="Response must be concise and helpful", generator=Generator(model="openai/gpt-4"))Output Format
Section titled “Output Format”LLM checks expect the model to return structured output with:
passed(bool): Whether the check passedreason(str, optional): Explanation of the result
The BaseLLMCheck automatically parses this structure into a CheckResult.
LLMCheckResult
Section titled “LLMCheckResult”Default result model for LLM-based checks.
Module: giskard.checks.judges.base
Attributes
Section titled “Attributes”| Attribute | Type | Description |
|---|---|---|
passed | bool | Whether the check passed |
reason | str | None | Optional explanation for the result |
This is the structured output format expected from the LLM when using BaseLLMCheck.
Groundedness
Section titled “Groundedness”Validates that answers are grounded in provided context documents.
Module: giskard.checks.judges.groundedness
Description
Section titled “Description”Uses an LLM to determine if an answer is properly supported by the given context. This is crucial for RAG (Retrieval-Augmented Generation) systems to ensure responses don’t hallucinate information not present in the retrieved documents.
Attributes
Section titled “Attributes”| Attribute | Type | Default | Description |
|---|---|---|---|
answer | str | None | None | The answer text to evaluate |
answer_key | str | "trace.last.outputs" | JSONPath to extract answer from trace |
context | str | list[str] | None | None | Context document(s) that should support the answer |
context_key | str | "trace.last.metadata.context" | JSONPath to extract context from trace |
generator | BaseGenerator | None | None | LLM generator for evaluation |
Static values:
from giskard.checks import Groundednessfrom giskard.agents.generators import Generator
check = Groundedness( answer="The Eiffel Tower is in Paris.", context=["Paris is the capital of France.", "The Eiffel Tower is a famous landmark."], generator=Generator(model="openai/gpt-4"))Extracting from trace:
check = Groundedness( answer_key="trace.last.outputs.answer", context_key="trace.last.metadata.retrieved_docs", generator=Generator(model="openai/gpt-4"))
# Run against a traceresult = await check.run(trace)Conformity
Section titled “Conformity”Validates that interactions conform to a specified rule or requirement.
Module: giskard.checks.judges.conformity
Description
Section titled “Description”Uses an LLM to evaluate whether an interaction (inputs, outputs, and metadata) conforms to a given rule. The rule supports Jinja2 templating, allowing for dynamic rules that reference trace data.
Attributes
Section titled “Attributes”| Attribute | Type | Description |
|---|---|---|
rule | str | The conformity rule to evaluate. Supports Jinja2 templating with access to the trace object |
generator | BaseGenerator | None | LLM generator for evaluation (falls back to default) |
Static rule:
from giskard.checks import Conformityfrom giskard.agents.generators import Generator
check = Conformity( rule="The response must be professional and polite", generator=Generator(model="openai/gpt-4"))Dynamic rule with templating:
check = Conformity( rule="The response must contain the keywords '{{ trace.last.inputs.required_keywords }}' and be concise", generator=Generator(model="openai/gpt-4"))
# The rule is rendered at runtime with access to trace dataresult = await check.run(trace)Accessing different trace elements:
# Reference inputscheck = Conformity( rule="Respond to the user's query: '{{ trace.last.inputs }}'")
# Reference metadatacheck = Conformity( rule="Use a {{ trace.last.metadata.tone }} tone in the response")
# Reference earlier interactionscheck = Conformity( rule="Build upon the previous answer: '{{ trace.interactions[-2].outputs }}'")LLMJudge
Section titled “LLMJudge”General-purpose LLM-based validation with custom prompts.
Module: giskard.checks.judges.judge
Description
Section titled “Description”The most flexible LLM check that allows you to define completely custom evaluation logic through prompts. Use this when the specialized checks (Groundedness, Conformity) don’t fit your needs.
Attributes
Section titled “Attributes”| Attribute | Type | Description |
|---|---|---|
prompt | str | None | Inline prompt content with Jinja2 templating support |
prompt_path | str | None | Path to a template file (e.g., "checks::my_template.j2") |
generator | BaseGenerator | None | LLM generator for evaluation |
Inline prompt:
from giskard.checks import LLMJudgefrom giskard.agents.generators import Generator
check = LLMJudge( prompt=""" Evaluate if the response is helpful and accurate.
User Input: {{ trace.last.inputs }} AI Response: {{ trace.last.outputs }}
Return passed=true if the response is helpful and accurate, passed=false otherwise. Provide a reason for your decision. """, generator=Generator(model="openai/gpt-4"))Template file:
# First, create a template file at templates/checks/safety.j2check = LLMJudge( prompt_path="checks::safety.j2", generator=Generator(model="openai/gpt-4"))Complex evaluation:
check = LLMJudge( prompt=""" Evaluate the multi-turn conversation quality.
Conversation history: {% for interaction in trace.interactions %} User: {{ interaction.inputs }} Assistant: {{ interaction.outputs }} {% endfor %}
Criteria: 1. Consistency across turns 2. Relevant responses 3. Professional tone
Return passed=true if all criteria are met, passed=false otherwise. Include specific reasons for any failures. """, generator=Generator(model="openai/gpt-4"))Template Variables
Section titled “Template Variables”The following variables are available in prompts:
| Variable | Description |
|---|---|
trace | Full trace object with all interactions |
trace.interactions | List of all interactions in order |
trace.last | Most recent interaction (preferred) |
trace.last.inputs | Inputs from the most recent interaction |
trace.last.outputs | Outputs from the most recent interaction |
trace.last.metadata | Metadata from the most recent interaction |
SemanticSimilarity
Section titled “SemanticSimilarity”Validate semantic similarity between outputs and expected content.
Module: giskard.checks.builtin.semantic_similarity
from giskard.checks import SemanticSimilarityfrom giskard.agents.generators import Generator
check = SemanticSimilarity( expected="The capital of France is Paris.", actual_key="trace.last.outputs", threshold=0.8, generator=Generator(model="openai/gpt-4"))Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
expected | str | required | Expected semantic content |
actual | str | None | None | Actual output to compare |
actual_key | str | "trace.last.outputs" | JSONPath to extract actual value |
threshold | float | 0.8 | Similarity threshold (0.0 to 1.0) |
generator | BaseGenerator | None | None | LLM generator for evaluation |
Common Patterns
Section titled “Common Patterns”Combining Multiple Checks
Section titled “Combining Multiple Checks”from giskard.checks import Groundedness, Conformity, LLMJudge, Scenario
scenario = ( Scenario() .interact( inputs="What is the capital of France?", outputs=lambda inputs: "Paris is the capital of France." ) .check(Groundedness( context=["France is a country in Europe.", "Paris is the capital."] )) .check(Conformity( rule="The response must be a complete sentence" )) .check(LLMJudge( prompt="Is the response educational and informative? Return passed=true/false." )))Reusing Generators
Section titled “Reusing Generators”from giskard.agents.generators import Generatorfrom giskard.checks import set_default_generator
# Set once, use everywheregenerator = Generator(model="openai/gpt-4", temperature=0.1)set_default_generator(generator)
# No need to pass generator anymorecheck1 = Groundedness(answer="...", context=["..."])check2 = Conformity(rule="...")check3 = LLMJudge(prompt="...")Error Handling
Section titled “Error Handling”from giskard.checks import CheckStatus
result = await check.run(trace)
if result.status == CheckStatus.ERROR: print(f"Check failed with error: {result.message}")elif result.status == CheckStatus.FAIL: print(f"Check failed: {result.message}") print(f"Details: {result.details}")elif result.status == CheckStatus.PASS: print(f"Check passed: {result.message}")Creating Custom Checks
Section titled “Creating Custom Checks”For validation logic that doesn’t fit built-in checks, create a custom check:
from giskard.checks import Check, CheckResult, Trace
@Check.register("custom_business_logic")class CustomBusinessCheck(Check): threshold: float = 0.9 allowed_categories: list[str] = []
async def run(self, trace: Trace) -> CheckResult: # Extract data output = trace.last.outputs category = output.get("category") confidence = output.get("confidence", 0)
# Validate category if category not in self.allowed_categories: return CheckResult.failure( message=f"Invalid category: {category}", details={"category": category, "allowed": self.allowed_categories} )
# Validate confidence if confidence < self.threshold: return CheckResult.failure( message=f"Confidence {confidence} below threshold {self.threshold}", details={"confidence": confidence, "threshold": self.threshold} )
return CheckResult.success( message="Validation passed", details={"confidence": confidence, "category": category} )
# Use the custom checkcheck = CustomBusinessCheck( threshold=0.85, allowed_categories=["sports", "news", "entertainment"])result = await check.run(trace)See Also
Section titled “See Also”- Core API - Base classes and fundamental types
- Scenarios - Multi-step workflow testing
- Custom Checks Guide - Detailed guide on creating custom checks