Few-shot poisoning test
What middleBrick covers
- 18 adversarial LLM probes across Quick, Standard, and Deep tiers
- System prompt extraction and instruction override detection
- Few-shot poisoning and indirect prompt injection analysis
- Encoding bypass detection including base64 and ROT13
- OpenAPI 3.x and Swagger 2.0 parsing with $ref resolution
- Authenticated scanning with header allowlisting
What is a few-shot poisoning test
A few-shot poisoning test evaluates whether an LLM-based service can be compromised by injecting subtle, targeted examples into prompts or tool instructions. The test adds small, carefully crafted perturbations to training or inference data to shift model behavior in favor of an attacker. Typical objectives include extracting system instructions, overriding safety constraints, or inducing data leakage through the model API.
Common mistakes when skipping this test
Teams that skip few-shot poisoning testing assume model guardrails are sufficient against low-volume, carefully placed examples. In practice, attackers can use encoding, translation, or nested instructions to bypass surface-level defenses. Without measurement, you cannot know whether a model will comply with malicious instructions, expose prompts, or propagate tainted outputs to downstream systems.
Workflow for conducting few-shot poisoning tests
Start with a baseline assessment of the model behavior using benign prompts. Then introduce adversarial examples at the prompt or instruction layer, varying encoding, placement, and context. Measure changes in model outputs across multiple tiers, focusing on jailbreak success, data exfiltration indicators, and cost anomalies. Record token counts and response consistency to detect low-and-slow poisoning attempts.
Example workflow using the middleBrick CLI:
middlebrick scan https://api.example.com/openapi.json --llm-scan-tier deep --output json
Use the JSON output to map detected jailbreaks and data leakage indicators to specific prompt templates, then refine detection rules iteratively.
What middleBrick covers out of the box
middleBrick performs 18 adversarial probes across three scan tiers (Quick, Standard, Deep). The LLM security checks cover system prompt extraction, instruction override, DAN and roleplay jailbreaks, data exfiltration, cost exploitation, encoding bypasses (base64, ROT13), translation-embedded injection, few-shot poisoning, markdown injection, multi-turn manipulation, indirect prompt injection, token smuggling, tool-abuse, nested instruction injection, and PII extraction.
The scanner parses OpenAPI 3.0, 3.1, and Swagger 2.0 definitions with recursive $ref resolution and cross-references spec definitions against runtime findings. This surfaces missing security schemes, deprecated operations, and over-exposed fields that may amplify poisoning impact.
Integration into your security program
Use the Web Dashboard to track scan score trends over time and download branded compliance evidence. Add the GitHub Action to CI/CD to fail builds when the LLM security score drops below your threshold. For automated pipelines, call the API client to integrate scanning into existing workflows. Schedule regular rescans with the Pro tier to detect introduced weaknesses after model updates or prompt changes.
Authenticated scanning supports Bearer, API key, Basic auth, and cookies, with domain verification to ensure only your organization runs scans against protected endpoints. Header allowlisting limits forwarded headers to Authorization, X-API-Key, Cookie, and X-Custom-*.