Welcome to Giskard

Welcome to Giskard! This section will help you understand what Giskard is, choose the right offering for your needs, and get started quickly.

Giskard Hub – Our enterprise platform for LLM agent testing with team collaboration and continuous red teaming, offering both a user-friendly UI for business users and a powerful SDK for technical users
Giskard Open-Source - Open-source Python library for LLM testing and evaluation, offering a programmatic interface for technical users, with basic testing capabilities to get started.
Giskard Research - Our research on AI safety & security

Giskard Hub

Giskard Hub is our enterprise platform for LLM agent testing with advanced team collaboration and continuous red teaming. It provides a set of tools for business users and developers to test and evaluate Agents in production environments, including:

Team collaboration - Real-time collaboration with shared workspaces, collaborative annotation workflows, and role-based access control for seamless team coordination
Continuous red teaming - Continuous threat detection for new vulnerabilities with automated scanning and monitoring capabilities
Access control - Manage who can see what data and run which tests across your organization
Dataset management - Centralized storage and versioning of test cases for consistent testing
Custom failure categories - Define and categorize your own failure types beyond standard security and business logic issues
Enterprise compliance features - 2FA, audit logs, SSO, and enterprise-grade security controls
Custom business checks - Create and deploy your own specialized testing logic and validation rules
Alerting - Get notified when issues are detected with configurable notification systems
Evaluations - Agent evaluations with cron-based scheduling for continuous monitoring
Knowledge bases - Store and manage domain knowledge to enhance testing scenarios

Giskard Hub UI

As a business user, you can use the Giskard Hub to create test datasets, run evaluations, and manage your team.

Giskard Hub SDK

As a developer, you can use an SDK to interact with the Giskard Hub programmatically.

Open source

Giskard Open Source is a Python library for LLM testing and evaluation. It is available on GitHub and formed the basis for our course on Red Teaming LLM Applications on Deeplearning.AI.

The library provides a set of tools for testing and evaluating LLMs, including:

Automated detection of security vulnerabilities using LLM Scan.
Automated detection of business logic failures using RAG Evaluation Toolkit.

Giskard Open Source

As a developer, you can use the Open Source SDK to get familiar with basic testset generation for business and security failures.

Deeplearning.AI

Our course on red teaming LLM applications on Deeplearning.AI helps you understand how to test, red team and evaluate LLM applications.

Open research

Giskard Research contributes to research on AI safety and security to showcase and understand the latest advancements in the field. Some work has been funded by the the European Commission, Bpifrance, and we’ve collaborated with leading AI research organizations like the AI Incident Database and Google DeepMind.

Phare

Phare is a multilingual benchmark to evaluate LLMs across key safety & security dimensions, including hallucination, factual accuracy, bias, and potential harm.

RealHarm

RealHarm is a dataset of problematic interactions with textual AI agents built from a systematic review of publicly reported incidents.

RealPerformance

RealPerformance is a dataset of functional issues of language models that mirrors failure patterns identified through rigorous testing in real LLM agents.

RealPerformance website