What are Giskard Checks?

Giskard Checks is a lightweight Python library for testing and evaluating non-deterministic applications such as LLM-based systems.

Introduction

Giskard Checks provides a flexible and powerful framework for testing AI applications including RAG systems, agents, summarization models, and more. Whether you’re building chatbots, question-answering systems, or complex multi-step workflows, Giskard Checks helps you ensure quality and reliability.

Key Features

Built-in Check Library: Ready-to-use checks including LLM-as-a-judge evaluations, string matching, equality assertions, and more
Flexible Testing Framework: Support for both single-turn and multi-turn scenarios with stateful trace management
Type-Safe & Modern: Built on Pydantic for full type safety and validation
Async-First: Native async/await support for efficient concurrent testing
Highly Customizable: Easy extension points for custom checks and interaction patterns
Serializable Results: Immutable, JSON-serializable results for easy storage and analysis

Quick Links

🚀 Quickstart Installation, configuration, and your first test

📚 AI Testing Guide Learn core concepts, single-turn and multi-turn testing

💡 Tutorials Practical examples for RAG, agents, and more

🔧 API Reference Complete API documentation

Use Cases

Giskard Checks is designed for:

RAG Evaluation: Test groundedness, relevance, and context usage in retrieval-augmented generation systems
Agent Testing: Validate multi-step agent workflows with tool calls and complex reasoning
Quality Assurance: Ensure consistent output quality across model updates and deployments
LLM Guardrails: Implement safety checks, content moderation, and compliance validation
Regression Testing: Track model behavior changes over time with reproducible test suites