Multi-turn manipulation audit

What middleBrick covers

  • Multi-turn adversarial probe execution across three scan tiers
  • OpenAPI 3.0/3.1 and Swagger 2.0 parsing with recursive $ref resolution
  • Detection of prompt injection, jailbreak, and PII extraction patterns
  • Identification of encoding bypasses and token smuggling attempts
  • Cross-reference runtime behavior against API specification definitions
  • Programmatic access via CLI, API client, and MCP Server

What multi-turn manipulation audit means

Multi-turn manipulation audit examines how an API-driven LLM agent behaves across a chain of requests. Instead of a single prompt, the test follows a workflow where earlier responses influence later instructions, tool usage, and data exposure. The goal is to detect whether an agent can be steered into executing unintended actions, exfiltrating information, or bypassing guardrails over multiple conversational turns.

Risks of skipping this workflow

Without a structured multi-turn audit, teams miss chained prompt-injection paths that only appear after several interactions. An agent may appear safe in isolation but can be coaxed into roleplay jailbreaks, instruction override, or PII extraction after following a sequence of plausible requests. This exposes surface to indirect prompt injection, token smuggling, and tool-abuse that single-turn tests do not surface, increasing the chance of real-world compromise through conversational interfaces.

A practical audit workflow

Begin with reconnaissance to identify endpoints that accept LLM inputs or generate agentic behavior. Run a baseline quick scan to establish a score, then progress to Standard and Deep tiers that increase chain length and probe complexity. Track findings across turns, correlate indicators such as repeated jailbreak patterns or encoded exfiltration, and document which turns trigger unsafe outcomes. Use the results to tighten system instructions, constrain tool permissions, and add turn-level monitoring.

middlebrick scan https://api.example.com/agent --tier deep --output json

Coverage provided by middleBrick

middleBrick maps findings to OWASP API Top 10 (2023) and supports audit evidence for SOC 2 Type II and PCI-DSS 4.0. It runs 18 adversarial probes across three scan tiers focused on system prompt extraction, instruction override, DAN and roleplay jailbreaks, data exfiltration, cost exploitation, encoding bypasses, translation-embedded injection, few-shot poisoning, markdown injection, multi-turn manipulation, indirect prompt injection, token smuggling, tool-abuse, nested instruction injection, and PII extraction. The scanner parses OpenAPI 3.0, 3.1, and Swagger 2.0 with recursive $ref resolution and cross-references spec definitions against runtime behavior to highlight undefined security schemes and deprecated operations.

Operational considerations and limitations

middleBrick is a scanner and does not fix, patch, or remediate findings. It does not perform active SQL injection or command injection testing, and it does not detect business logic vulnerabilities that require domain context. Blind SSRF that relies on out-of-band infrastructure is out of scope. The tool surfaces findings and remediation guidance; validation and prioritization still require human expertise aligned with your application architecture and threat model.

Frequently Asked Questions

Which scan tier should I use for multi-turn manipulation testing?
Start with Standard to validate common jailbreak patterns, then use Deep to exercise long chains, encoding tricks, and tool-abuse scenarios across many turns.
Does middleBrick store my scan data for model training?
No. Customer scan data is deletable on demand, purged within 30 days of cancellation, and is never used for model training.
Can authenticated scans reduce blind SSRF coverage?
Authentication adds reach but does not bring blind SSRF into scope, because out-of-band infrastructure is not tested.
How does middleBrick relate to compliance frameworks?
It maps findings to OWASP API Top 10 (2023), and helps you prepare for SOC 2 Type II and PCI-DSS 4.0 by providing evidence around specific controls.