Quickstart
This tutorial walks you through installing the SDK, connecting to the Hub, and running a complete evaluation against an LLM agent — from dataset creation to reading results.
Prerequisites
Section titled “Prerequisites”- Python 3.10 or later
- A running Giskard Hub instance (cloud or self-hosted)
- An API key from the Hub UI (User Settings → API Keys)
1. Install the SDK
Section titled “1. Install the SDK”pip install giskard-hub2. Configure authentication
Section titled “2. Configure authentication”The SDK reads your Hub URL and API key from environment variables. Set them before running any code:
export GISKARD_HUB_BASE_URL="https://your-hub-instance.example.com"export GISKARD_HUB_API_KEY="gsk_..."Alternatively, pass them directly to the client constructor:
from giskard_hub import HubClient
hub = HubClient( base_url="https://your-hub-instance.example.com", api_key="gsk_...",)3. Create a project
Section titled “3. Create a project”Projects are the top-level container for all your resources. Create one or retrieve an existing one:
# Create a new projectproject = hub.projects.create( name="Customer Support Bot", description="Evaluation project for our support chatbot",)
# Or list existing projects and pick oneprojects = hub.projects.list()project = projects[0]
print(f"Using project: {project.name} ({project.id})")4. Register an agent
Section titled “4. Register an agent”An agent points to your LLM application. The Hub calls this endpoint during evaluations.
agent = hub.agents.create( project_id=project.id, name="Support Bot v1", description="GPT-4o-based customer support chatbot", url="https://your-app.example.com/api/chat", supported_languages=["en"], headers=[{"name": "Authorization", "value": "Bearer <your-app-token>"}],)
print(f"Agent registered: {agent.id}")5. Run a vulnerability scan
Section titled “5. Run a vulnerability scan”Before building a dataset, run a quick scan to surface security weaknesses in your agent:
import time
scan = hub.scans.create( project_id=project.id, agent_id=agent.id, tags=["gsk:threat-type='prompt-injection'"],)
print(f"Scan started: {scan.id}")
while scan.status.state == "running": time.sleep(10) scan = hub.scans.retrieve(scan.id)
print(f"Scan complete. Grade: {scan.grade}")The grade ranges from A (no issues found) to D (critical vulnerabilities detected). See Vulnerability Scanning for the full tag catalogue, KB-grounded scans, and how to review probe results and turn successful attacks into test cases.
6. Create a dataset and add test cases
Section titled “6. Create a dataset and add test cases”A dataset is a collection of test cases — conversations with expected outcomes and quality checks.
dataset = hub.datasets.create( project_id=project.id, name="Core Q&A Suite", description="Basic correctness and tone checks",)
# Add a test casehub.test_cases.create( dataset_id=dataset.id, messages=[ {"role": "user", "content": "What is your return policy?"}, ], demo_output="We offer a 30-day return policy for all items.", checks=[ { "identifier": "correctness", "params": { "type": "correctness", "reference": "We offer a 30-day return policy for all items." }, }, ],)The checks field controls which criteria are applied to each agent response — these can be LLM-judge, embedding similarity, or rule-based checks. See Checks & Metrics for the full list of built-in checks and how to define custom ones.
7. Run an evaluation
Section titled “7. Run an evaluation”Now trigger an evaluation that sends every test case to your agent and scores the responses:
import time
evaluation = hub.evaluations.create( project_id=project.id, agent_id=agent.id, criteria={ "dataset_id": dataset.id, }, name="v1 baseline",)
print(f"Evaluation started: {evaluation.id}")
# Poll until the evaluation completeswhile evaluation.status.state == "running": time.sleep(5) evaluation = hub.evaluations.retrieve(evaluation.id)
print("Evaluation complete!")8. Read the results
Section titled “8. Read the results”Once complete, fetch the per-test-case results and inspect the metrics:
evaluation_results = hub.evaluations.results.list(evaluation.id)
for eval_result in evaluation_results: print(f"Test case {eval_result.test_case.id}: {eval_result.state}") for check_result in eval_result.results: print(f" {check_result.name}: {'✓' if check_result.passed else '✗'}")You can also view the full evaluation with aggregated metrics in the Hub UI.
Next steps
Section titled “Next steps”- Local agents: evaluate a Python function directly without an HTTP endpoint — see Evaluations
- Generate test cases automatically: use scenarios or knowledge bases — see Datasets
- Vulnerability scanning: find security weaknesses with Scans
- Schedule recurring runs: see Scheduled Evaluations
- Full API details: see the API Reference