42Crunch for Model information disclosure check
What middleBrick covers
- 18 LLM adversarial probes across Quick, Standard, and Deep tiers
- Black-box scanning with no agents or SDK dependencies
- Mapping findings to OWASP API Top 10 (2023)
- Authenticated scanning with strict header allowlist
- Web dashboard for tracking score trends and evidence
- CI/CD integration via GitHub Action and API client
Model information disclosure overview
Model information disclosure occurs when an AI assistant exposes system prompts, instructions, or internal behavior that should remain private. This can reveal guardrails, training details, or operational constraints that an attacker may leverage. The scenario is common in LLM interfaces that allow indirect or multi-turn interactions, where crafted inputs can coax the model into repeating forbidden content or leaking meta instructions.
How middleBrick approaches model disclosure checks
middleBrick targets LLM-specific leakage through 18 adversarial probes organized into three scan tiers: Quick, Standard, and Deep. These probes focus on system prompt extraction, instruction override, DAN and roleplay jailbreaks, data exfiltration techniques, cost exploitation, encoding bypasses such as base64 and ROT13, translation-embedded injection, few-shot poisoning, markdown injection, multi-turn manipulation, indirect prompt injection, token smuggling, tool-abuse patterns, nested instruction injection, and PII extraction.
The scanner operates as a black-box security tool. It submits text-only POST payloads for LLM probes and reads back responses without accessing model internals. Findings are mapped to OWASP API Top 10 (2023) and supplemented with detection notes that help you understand how each probe relates to potential disclosure paths.
Scan workflow and integration considerations
To run a disclosure-focused scan, submit the base URL of the LLM endpoint to the middleBrick scanner. The scan completes in under a minute using read-only GET and HEAD methods, with text-only POST allowed for LLM probes. No agents, SDKs, or code access are required, making integration friction low regardless of language or framework.
Authenticated scanning is available at the Starter tier and above, requiring Bearer, API key, Basic auth, or Cookie credentials. Domain verification ensures only the domain owner can scan with credentials, and a strict header allowlist limits forwarded headers to Authorization, X-API-Key, Cookie, and X-Custom-*.
Results appear in the Web Dashboard, where you can track score trends, download compliance PDFs, and review prioritized findings. The CLI provides JSON or text output for automation, and the GitHub Action can enforce a minimum score gate in CI/CD pipelines.
Mapping to compliance and detection context
middleBrick maps findings related to model information disclosure directly to OWASP API Top 10 (2023). This alignment supports audit evidence collection and helps you prepare for security reviews that reference control frameworks such as SOC 2 Type II and PCI-DSS 4.0.
The tool surfaces findings relevant to broader regulatory alignment, including scenarios where disclosure risks intersect with data exposure categories. It does not certify compliance, and it does not replace a human pentester for high-stakes audits that require business logic validation.
Limitations and next steps
middleBrick does not perform active SQL injection or command injection, as those require intrusive payloads outside its scope. It does not detect blind SSRF or business logic vulnerabilities, which often demand deep domain knowledge. Disclosure-specific risks that rely on out-of-band channels or nuanced prompt behavior are also outside detection scope.
If your workflow centers on model information disclosure, use the Deep scan tier to maximize probe coverage, combine results with manual testing, and treat the output as one layer in a broader security strategy. The API client and MCP Server enable programmatic access so you can integrate scans into custom tooling or AI-assisted development workflows.