Giskard Hub UI
As a business user, you can use the Giskard Hub to create test datasets, run evaluations, and manage your team.
Welcome to Giskard! This section will help you understand what Giskard is, choose the right offering for your needs, and get started quickly.
Giskard Hub is our enterprise platform for LLM agent testing with advanced team collaboration and continuous red teaming. It provides a set of tools for business users and developers to test and evaluate Agents in production environments, including:
Giskard Hub UI
As a business user, you can use the Giskard Hub to create test datasets, run evaluations, and manage your team.
Giskard Hub SDK
As a developer, you can use an SDK to interact with the Giskard Hub programmatically.
Giskard Open Source is a Python library for LLM testing and evaluation. It is available on GitHub and formed the basis for our course on Red Teaming LLM Applications on Deeplearning.AI.
The library provides a set of tools for testing and evaluating LLMs, including:
Giskard Open Source
As a developer, you can use the Open Source SDK to get familiar with basic testset generation for business and security failures.
Deeplearning.AI
Our course on red teaming LLM applications on Deeplearning.AI helps you understand how to test, red team and evaluate LLM applications.
Giskard Research contributes to research on AI safety and security to showcase and understand the latest advancements in the field. Some work has been funded by the the European Commission, Bpifrance, and we’ve collaborated with leading AI research organizations like the AI Incident Database and Google DeepMind.
Phare
Phare is a multilingual benchmark to evaluate LLMs across key safety & security dimensions, including hallucination, factual accuracy, bias, and potential harm.
RealHarm
RealHarm is a dataset of problematic interactions with textual AI agents built from a systematic review of publicly reported incidents.
RealPerformance
RealPerformance is a dataset of functional issues of language models that mirrors failure patterns identified through rigorous testing in real LLM agents.