42Crunch for Model information disclosure check

What middleBrick covers

18 LLM adversarial probes across Quick, Standard, and Deep tiers
Black-box scanning with no agents or SDK dependencies
Mapping findings to OWASP API Top 10 (2023)
Authenticated scanning with strict header allowlist
Web dashboard for tracking score trends and evidence
CI/CD integration via GitHub Action and API client

Model information disclosure overview

Model information disclosure occurs when an AI assistant exposes system prompts, instructions, or internal behavior that should remain private. This can reveal guardrails, training details, or operational constraints that an attacker may leverage. The scenario is common in LLM interfaces that allow indirect or multi-turn interactions, where crafted inputs can coax the model into repeating forbidden content or leaking meta instructions.

How middleBrick approaches model disclosure checks

middleBrick targets LLM-specific leakage through 18 adversarial probes organized into three scan tiers: Quick, Standard, and Deep. These probes focus on system prompt extraction, instruction override, DAN and roleplay jailbreaks, data exfiltration techniques, cost exploitation, encoding bypasses such as base64 and ROT13, translation-embedded injection, few-shot poisoning, markdown injection, multi-turn manipulation, indirect prompt injection, token smuggling, tool-abuse patterns, nested instruction injection, and PII extraction.

The scanner operates as a black-box security tool. It submits text-only POST payloads for LLM probes and reads back responses without accessing model internals. Findings are mapped to OWASP API Top 10 (2023) and supplemented with detection notes that help you understand how each probe relates to potential disclosure paths.

Scan workflow and integration considerations

To run a disclosure-focused scan, submit the base URL of the LLM endpoint to the middleBrick scanner. The scan completes in under a minute using read-only GET and HEAD methods, with text-only POST allowed for LLM probes. No agents, SDKs, or code access are required, making integration friction low regardless of language or framework.

Authenticated scanning is available at the Starter tier and above, requiring Bearer, API key, Basic auth, or Cookie credentials. Domain verification ensures only the domain owner can scan with credentials, and a strict header allowlist limits forwarded headers to Authorization, X-API-Key, Cookie, and X-Custom-*.

Results appear in the Web Dashboard, where you can track score trends, download compliance PDFs, and review prioritized findings. The CLI provides JSON or text output for automation, and the GitHub Action can enforce a minimum score gate in CI/CD pipelines.

Mapping to compliance and detection context

middleBrick maps findings related to model information disclosure directly to OWASP API Top 10 (2023). This alignment supports audit evidence collection and helps you prepare for security reviews that reference control frameworks such as SOC 2 Type II and PCI-DSS 4.0.

The tool surfaces findings relevant to broader regulatory alignment, including scenarios where disclosure risks intersect with data exposure categories. It does not certify compliance, and it does not replace a human pentester for high-stakes audits that require business logic validation.

Limitations and next steps

middleBrick does not perform active SQL injection or command injection, as those require intrusive payloads outside its scope. It does not detect blind SSRF or business logic vulnerabilities, which often demand deep domain knowledge. Disclosure-specific risks that rely on out-of-band channels or nuanced prompt behavior are also outside detection scope.

If your workflow centers on model information disclosure, use the Deep scan tier to maximize probe coverage, combine results with manual testing, and treat the output as one layer in a broader security strategy. The API client and MCP Server enable programmatic access so you can integrate scans into custom tooling or AI-assisted development workflows.

Frequently Asked Questions

Can middleBrick detect all forms of model information disclosure?

No. It covers known prompt-injection and extraction patterns but cannot find business-logic-specific leakage or blind channels that require human expertise.

Does authenticated scanning affect disclosure detection accuracy?

It can increase coverage by exercising authenticated endpoints, but disclosure probes remain read-only and non-intrusive.

How are findings related to compliance frameworks described?

Findings map to OWASP API Top 10 (2023) and help prepare evidence for SOC 2 Type II and PCI-DSS 4.0 controls without claiming certification.

Can I integrate scans into CI/CD for ongoing monitoring?

Yes. The GitHub Action enforces score thresholds, and the API client supports scheduled or triggered scans for continuous monitoring.