42Crunch for Jailbreak resistance probe battery
What middleBrick covers
- Executes 18 jailbreak adversarial probes across three scan tiers
- Validates system prompt extraction and instruction override attempts
- Tests encoding bypass, token smuggling, and multi-turn manipulation
- Provides risk score grades and prioritized remediation guidance
- Integrates via CLI, dashboard, API client, and CI/CD gates
- Maps findings to OWASP API Top 10 (2023) for review alignment
Jailbreak resistance probe battery overview
A jailbreak resistance probe battery tests how well a model ignores or bypasses system instructions and unsafe content rules. middleBrick surfaces these probes as an LLM security category aligned to OWASP API Top 10, covering adversarial techniques across three scan tiers.
Coverage of adversarial jailbreak techniques
The scanner executes 18 adversarial probes across Quick, Standard, and Deep tiers. Techniques include system prompt extraction, instruction override, DAN and roleplay jailbreaks, data exfiltration, cost exploitation, base64 and ROT13 encoding bypass, translation-embedded injection, few-shot poisoning, markdown injection, multi-turn manipulation, indirect prompt injection, token smuggling, tool-abuse, nested instruction injection, and PII extraction.
Each probe validates whether the model resists manipulation attempts that try to reveal system instructions or produce disallowed outputs. Results highlight which attack vectors succeed and where model guardrails weaken.
Integration with API scanning and constraints
middleBrick operates as a black-box scanner, requiring no agents or code access. It supports URL-based endpoints that accept text payloads, including text-only POST bodies used for LLM probes.
Scan time is under a minute per endpoint. The tool does not perform active SQL injection or command injection, and it does not attempt to fix or remediate findings. It exposes findings with remediation guidance so you can adjust prompts, harden guardrails, or modify model configurations.
Mapping to compliance and limitations
Findings map to OWASP API Top 10 (2023), which helps you prepare for security reviews that reference jailbreak resistance as part of LLM-related controls. middleBrick is a scanner, not an auditor, and it does not certify compliance with any framework.
The scanner does not detect blind SSRF or business logic vulnerabilities that require deep domain understanding. High-stakes audits still require human pentesters to validate jailbreak resistance in the context of your application and data flows.
Workflow integration and output
Use the CLI with middlebrick scan <url> to run a Quick battery and receive a risk score and prioritized findings. The Web Dashboard groups results by probe type, shows score trends, and allows export of branded compliance PDFs.
Programmatic access returns structured data you can integrate into CI/CD gates or monitoring pipelines. Note that authenticated scanning for this workflow requires domain verification and a Starter tier or higher, with only approved headers forwarded to the endpoint.