Migrating from 42Crunch to middleBrick for Multi-turn manipulation audit
What middleBrick covers
- Deterministic multi-turn probe library with 18 adversarial tests
- JSON output for programmatic analysis and CI integration
- Score tracking and diffing across scheduled scans
- Policy enforcement via GitHub Action gates
- Retention controls and data deletion on demand
- Read-only testing with no destructive payloads
Current limitations with multi-turn manipulation audits
Multi-turn manipulation testing typically depends on interactive sessions that record prompts, model outputs, and token-level traces. These workflows are often manual, scattered across chat logs, and tied to a specific provider account. Without a standardized capture format, it is difficult to replay a chain of turns, share findings with teammates, or integrate results into CI pipelines. The lack of structured artifacts also makes it hard to measure drift over time or to programmatically compare scanner configurations.
How middleBrick structures multi-turn audit evidence
middleBrick runs a deterministic sequence of 18 adversarial probes across three scan tiers: Quick, Standard, and Deep. Each probe is a self-contained turn or turn pair (input, expected failure mode) with a clear pass or fail outcome. Results are stored as a flat list of findings, each including the probe identifier, tier, observed behavior, and a short remediation hint. This structure makes it straightforward to export findings as JSON for downstream analysis or to embed them in compliance artifacts.
Migration workflow and artifact mapping
When migrating from a conversational audit to middleBrick, map your existing chat logs to the corresponding probe identifiers listed in the scan report. Discard raw prompt text that is not covered by a defined probe, and retain only the findings that map to OWASP API Top 10 categories such as LLM01 through LLM18. You can replay the same URL or endpoint with the CLI to regenerate findings, and you can use the JSON output to compare scores across scan dates. The dashboard tracks score drift and shows which findings were resolved, introduced, or unchanged.
middlebrick scan https://api.example.com/openapi.json --output jsonCI integration and policy enforcement
middleBrick can be integrated into CI pipelines so that a failing build blocks deployment when the score drops below your chosen threshold. The GitHub Action reads the scan output, evaluates the list of findings against a configurable allowlist, and fails the job if violations exceed the policy. Because the scan is read-only and does not modify your API, you can run it on every pull request without risking production state. Use the CLI or the API client to tailor thresholds per service or per environment.
Ongoing monitoring and maintenance
With Pro tier, you can schedule rescans at six-hour, daily, weekly, or monthly intervals. Each new scan is diffed against the prior run, highlighting new findings, resolved findings, and score drift. Alerts are rate-limited to one per hour per API and can be delivered by email or through HMAC-SHA256 signed webhooks. Data retention is under your control: findings are deletable on demand and purged within 30 days of cancellation, and customer data is never used for model training.