Skip to content

Overview

Summary

These tutorials provide complete, working examples that you can adapt for your own use cases. Each tutorial includes:

Full working code
Explanation of key concepts
Common pitfalls and how to avoid them
Extensions and variations

Available Tutorials

🔍 RAG Evaluation Test retrieval quality, groundedness, and answer relevance in RAG systems

🤖 Testing Agents Validate multi-step agent workflows with tool usage and reasoning

💬 Chatbot Testing Test conversational flows, context handling, and response quality

🛡️ Content Moderation Implement safety checks and content filtering

What You’ll Learn

Through these tutorials, you’ll learn how to:

Design effective test suites for different AI application types
Combine built-in and custom checks for comprehensive validation
Handle both single-turn and multi-turn scenarios
Use LLM-as-a-judge for nuanced evaluation
Track metrics and analyze test results
Integrate checks into CI/CD pipelines

Prerequisites

Before starting these tutorials, you should:

Have completed the Install & Configure guide
Be familiar with the Core Concepts
Have basic Python and async/await knowledge
Have access to an LLM API (OpenAI, Anthropic, or compatible)

Getting Help

If you run into issues with these tutorials:

Check the Core Concepts for concept clarifications
Review the Custom Checks guide for check creation patterns
Look at the API reference for detailed documentation
Open an issue on GitHub if you find bugs or have suggestions