Skip to content
GitHubDiscord

Vulnerability Scanning

A Scan runs a set of automated adversarial probes against your agent to detect security and safety vulnerabilities. Giskard covers the OWASP LLM Top 10 (2025) as well as additional categories that go beyond the OWASP framework — Harmful Content Generation, Brand Damaging & Reputation, Legal & Financial Risk, and Misguidance & Unauthorized Advice. See the full attack category catalogue for details.

import time
from giskard_hub import HubClient
hub = HubClient()
scan = hub.scans.create(
project_id="project-id",
agent_id="agent-id",
)
print(scan.id)
# Poll until complete
while scan.status.state == "running":
time.sleep(10)
scan = hub.scans.retrieve(scan.id)
print(f"Scan complete. Grade: {scan.grade}")

The grade property gives an overall security posture rating: A (best) through D (worst), or N/A if not enough data was collected.


Use tags to focus the scan on specific vulnerability categories. Giskard covers a subset of the OWASP LLM Top 10 (2025) as well as additional categories that go beyond the OWASP framework.

TagCategoryOWASP mapping
gsk:threat-type='prompt-injection'Prompt InjectionLLM01
gsk:threat-type='data-privacy-exfiltration'Data Privacy & ExfiltrationLLM05
gsk:threat-type='excessive-agency'Excessive AgencyLLM06
gsk:threat-type='internal-information-exposure'Internal Information ExposureLLM01-07
gsk:threat-type='training-data-extraction'Training Data ExtractionLLM02
gsk:threat-type='denial-of-service'Denial of ServiceLLM10
gsk:threat-type='hallucination'Misinformation / HallucinationLLM09
gsk:threat-type='harmful-content-generation'Harmful Content Generation
gsk:threat-type='misguidance-and-unauthorized-advice'Misguidance & Unauthorized Advice
gsk:threat-type='legal-and-financial-risk'Legal & Financial Risk
gsk:threat-type='brand-damaging-and-reputation'Brand Damaging & Reputation
scan = hub.scans.create(
project_id="project-id",
agent_id="agent-id",
tags=[
"gsk:threat-type='prompt-injection'",
"gsk:threat-type='hallucination'",
],
)

Use hub.scans.list_categories() to retrieve the authoritative, up-to-date list of all available categories and their tags at runtime:

categories = hub.scans.list_categories()
for cat in categories:
print(cat.title, cat.owasp_id)

Pass a knowledge_base_id to anchor the probes to your actual document content. This is recommended for RAG-based agents because the attacks will reference real topics from your corpus:

scan = hub.scans.create(
project_id="project-id",
agent_id="agent-id",
knowledge_base_id="kb-id",
)

See Agents & Knowledge Bases for how to create and populate a KB.


probes = hub.scans.list_probes("scan-id")
for probe in probes:
if probe.status.state == "skipped":
continue
print(f"{probe.probe_category}{probe.probe_name}: {probe.metrics} ({probe.status.state})")
probe = hub.scans.probes.retrieve("probe-result-id")
print(probe.probe_description)

Each probe may generate multiple adversarial prompt attempts. Inspect them to understand exactly what the agent was asked and how it responded:

attempts = hub.scans.probes.list_attempts("probe-result-id")
for attempt in attempts:
print(f"Prompt: {[m.content for m in attempt.messages[:-1]]}")
print(f"Response: {attempt.messages[-1].content}")
print(f"Severity: {attempt.severity}") # higher than 0 means the attack succeeded
print("---")

If a flagged attempt is a false positive, update its review status:

from giskard_hub.types import ReviewStatus
hub.scans.attempts.update(
"probe-attempt-id",
review_status="ignored",
)

When a probe attempt succeeds (the attack elicited an undesired response), you can promote it directly into a dataset test case. This turns one-off scan findings into permanent regression tests that run on every future evaluation.

# Fetch all probes for a completed scan
probes = hub.scans.list_probes("scan-id")
dataset = hub.datasets.create(
project_id="project-id",
name=f"Regression tests from scan {'scan-id'}",
)
for probe in probes:
attempts = hub.scans.probes.list_attempts(probe.id)
for attempt in attempts:
# severity > 0 means the agent misbehaved
if attempt.severity > 0:
hub.test_cases.create(
dataset_id=dataset.id,
messages=[{"role": m.role, "content": m.content} for m in attempt.messages[:-1]],
demo_output={"role": "assistant", "content": attempt.messages[-1].content},
checks=[{"identifier": "no-harmful-content"}], # or any relevant check
tags=[probe.probe_category],
)
print(f"Imported attacks into dataset {dataset.id}")

scans = hub.scans.list(project_id="project-id")
hub.scans.delete("scan-id")
hub.scans.bulk_delete(scan_ids=["scan-id-1", "scan-id-2"])

Use scans as a security gate in your CI/CD pipeline. Exit with a non-zero code if the scan grade falls below your acceptable threshold:

import sys
import time
from giskard_hub import HubClient
hub = HubClient()
scan = hub.scans.create(
project_id="project-id",
agent_id="agent-id",
)
while scan.status.state == "running":
time.sleep(10)
scan = hub.scans.retrieve(scan.id)
if scan.status.state == "error":
print("Scan encountered errors.")
sys.exit(1)
print(f"Scan grade: {scan.grade}")
ACCEPTABLE_GRADES = ["A", "B"]
if scan.grade not in ACCEPTABLE_GRADES:
print(f"Security gate failed: grade {scan.grade} is not enough.")
sys.exit(1)
print("Security gate passed.")

GradeMeaning
ANo vulnerabilities detected
BMinor issues — low severity findings only
CModerate issues — some high severity findings
DSerious issues — critical severity findings
N/AInsufficient data to compute a grade

Grades are computed from the proportion and severity of probes that successfully elicited harmful or undesired behaviour from the agent.