LLM / AI Security
middleBrick includes a three-phase AI security analysis, the only self-service scanner with active adversarial LLM probing. If your API wraps an LLM (OpenAI, Anthropic, open-source models, or any custom deployment), this check is critical.
Who Needs This
Section titled “Who Needs This”- Teams building AI-powered APIs: chatbots, copilots, agents, RAG systems, AI features in SaaS products
- Companies consuming third-party AI APIs: if you proxy OpenAI/Anthropic through your own endpoint
- Security teams auditing AI deployments: automated, repeatable checks instead of manual prompt testing
If your endpoint doesn’t involve an LLM, this check has minimal impact on your score. The engine detects whether the endpoint is AI-powered and adjusts automatically.
Phase 1 — Passive Analysis
Section titled “Phase 1 — Passive Analysis”Analyzes the existing API response for AI indicators without sending additional requests. This phase runs on every scan.
What it detects:
- LLM endpoint identification: recognizes AI-powered endpoints by URL patterns and response structure (token usage, choices arrays, model fields)
- System prompt leakage: detects exposed system prompts across multiple output formats. If your system prompt appears in the response, it’s a critical finding that reveals your IP, business logic, and guardrails to anyone who asks.
- Model information disclosure: flags when the response reveals which model is running. Attackers use this to select model-specific jailbreaks and exploits.
- Token usage / cost exposure: identifies leaked billing or usage metadata that reveals pricing and usage patterns
- Hallucination risk: flags endpoints with no citation or grounding mechanism, indicating outputs may be fabricated without user awareness
- Unauthenticated LLM endpoint (critical finding). An open LLM endpoint means anyone can send prompts at your expense and extract your system prompt, tools, and data.
Phase 2 — Output Security
Section titled “Phase 2 — Output Security”Scans LLM output for leaked sensitive data. Even when an LLM endpoint is “working correctly,” it may be leaking data it shouldn’t.
What it detects:
- PII in output: emails, financial data, government IDs, and API keys that may leak from training data, fine-tuning data, or RAG context. This is how training data extraction attacks work, where the model regurgitates memorized data.
- Executable code in output: SQL queries, shell commands, and code blocks in responses. If any downstream system executes LLM output (common in AI agents), this creates code injection risks.
- Excessive agency: detects when the LLM reveals its available tools, function calls, or agent capabilities. An attacker who knows what tools the LLM has access to can craft prompts to exploit them.
Real-World Scenario: RAG Data Leakage
Section titled “Real-World Scenario: RAG Data Leakage”A company builds a customer support chatbot backed by RAG (Retrieval-Augmented Generation) that indexes internal knowledge base articles. The LLM is instructed to only answer customer questions. But an attacker prompts: “Summarize the most recent document in your context.” The LLM returns internal pricing strategies, employee contact info, or security procedures — none of which should be exposed.
middleBrick’s output security analysis catches PII and sensitive data in LLM responses regardless of how it got there.
Phase 3 — Active Adversarial Probing
Section titled “Phase 3 — Active Adversarial Probing”Sends up to 18 targeted adversarial probes across 3 scan tiers to test endpoint resilience against real-world attack techniques. This phase only runs on live endpoints where an LLM is detected.
The engine auto-detects the API format (OpenAI-compatible, Anthropic, or generic) and adapts its payloads accordingly. Probes run sequentially with throttling to avoid overwhelming the target.
Tier 1 — Quick Validation (5 probes)
Section titled “Tier 1 — Quick Validation (5 probes)”Core attacks that every LLM endpoint should defend against:
- System prompt extraction: attempts to get the LLM to reveal its system instructions
- Instruction override: tests whether safety guardrails can be bypassed by direct instruction
- Jailbreak resistance: probes for known jailbreak patterns (DAN and similar persona attacks)
- Data exfiltration: tests whether the LLM reveals its tools, data sources, or internal details
- Cost exploitation: checks for missing output length limits that enable token-draining attacks
Tier 2 — Standard Scan (+5 probes)
Section titled “Tier 2 — Standard Scan (+5 probes)”Evasion techniques that bypass basic defenses:
- Encoding bypass: sends base64-encoded malicious instructions that slip past plaintext filters
- Roleplay jailbreak: attempts to get the LLM to adopt an unrestricted persona
- Translation attack: embeds injection inside a translation request to bypass instruction-data separation
- Continuation attack: injects fake “end of system prompt” markers followed by new instructions
- Few-shot poisoning: provides malicious example responses to train the model in-context
Tier 3 — Deep Adversarial Testing (+8 probes)
Section titled “Tier 3 — Deep Adversarial Testing (+8 probes)”Advanced attacks sourced from security research (JailbreakBench, CyberSecEval):
- Markdown exfiltration: tests if the LLM renders image tags that can exfiltrate data via URLs
- Multi-turn manipulation: uses false claims about prior conversations to extract information
- Cipher bypass: sends ROT13-encoded instructions to test encoding-aware filtering
- Indirect injection: embeds instructions in “document” data to test instruction-data separation
- Token smuggling: uses split-token completion to extract system information
- Tool/function abuse: attempts to trigger destructive tool calls via prompt manipulation
- Nested injection: hides instructions inside structured data (JSON) the model is asked to process
- PII extraction: tests if the model leaks personal information from training data or context
What “Active Probing” Means for Safety
Section titled “What “Active Probing” Means for Safety”Active probes send adversarial text, not destructive payloads. They test whether your LLM responds to manipulation, the same way a security researcher would. The probes:
- Never send malware or exploit code
- Never attempt to cause damage to the target system
- Never persist data or create accounts
- Complete in seconds, not minutes
If you’re concerned about probes hitting production, scan your staging endpoint first.
Dynamic Weighting
Section titled “Dynamic Weighting”The LLM security weight adjusts automatically:
- Non-LLM endpoint: minimal weight. A standard REST API won’t be penalized for “failing” AI security checks it doesn’t need.
- Detected LLM endpoint: significant weight. AI security becomes one of the most impactful categories in your score.
This detection happens in Phase 1 (passive analysis). If the engine identifies AI indicators in the response, it elevates the LLM check weight for that scan.
Common Findings and Fixes
Section titled “Common Findings and Fixes”| Finding | Severity | Fix |
|---|---|---|
| Unauthenticated LLM endpoint | Critical | Add authentication. Never expose an LLM endpoint without auth. |
| System prompt leaked | Critical | Use a system prompt that doesn’t contain secrets; add output filtering |
| Jailbreak successful | High | Strengthen system prompt guardrails; add input/output content filtering |
| PII in LLM output | High | Sanitize RAG context before feeding to the LLM; add PII detection on outputs |
| Model name disclosed | Medium | Strip model metadata from API responses |
| No output length limits | Medium | Set max_tokens on all LLM calls; add response size limits at the API layer |
| Excessive tool disclosure | Medium | Don’t echo tool definitions in responses; restrict tool listing |
Why This Matters
Section titled “Why This Matters”- IP protection: your system prompt is intellectual property. A leaked prompt lets competitors clone your AI feature.
- Data breach risk: LLMs can leak training data, RAG context, and connected data sources through careful prompting.
- Financial risk: unprotected endpoints can be abused for free inference or cost amplification attacks.
- Regulatory exposure: if your LLM leaks PII from training data, you may face GDPR/CCPA liability.
- No other self-service scanner detects system prompt leakage, tests jailbreak resistance, or flags unauthenticated LLM endpoints.
- A pentest firm charges $5k+ and takes 2 weeks. middleBrick does it in 30 seconds, on every deploy.