Llm Data Leakage in Chi with Mongodb
Llm Data Leakage in Chi with Mongodb — how this specific combination creates or exposes the vulnerability
LLM data leakage in a Chi application that uses MongoDB as the primary data store occurs when language model interactions inadvertently expose or infer sensitive information stored in your MongoDB collections. This risk arises from two intersecting factors: how data is structured and retrieved for LLM consumption, and how the LLM itself may reveal information during interaction.
Chi is a lightweight, idiomatic Scala web framework often used to build APIs and services. When such services integrate with MongoDB—commonly via a reactive driver like reactivemongo—they may stream or query documents to transform and present data. If those documents contain sensitive fields (e.g., email, ssn, api_key) and are passed into prompts or exposed through endpoints used by an LLM, the LLM may output or log that data in ways that violate privacy or compliance expectations.
For example, a Chi route that fetches a user document from MongoDB and includes it directly in a system prompt can lead to system prompt leakage. An attacker might use prompt injection techniques to coerce the LLM into repeating or encoding that sensitive document data in its responses. Because MongoDB documents often contain nested fields and arrays, it is easy to inadvertently include more data than intended when constructing prompts or logging LLM outputs.
The LLM/AI Security checks in middleBrick specifically test for this scenario by probing endpoints that interact with MongoDB-backed data sources. It checks for unauthenticated LLM endpoints and performs active prompt injection tests—such as system prompt extraction and data exfiltration—to see whether MongoDB-derived content appears in LLM outputs. The scanner also reviews output for PII, API keys, and executable code that may have originated from MongoDB documents. Because Chi services often serve as APIs for single-page applications or mobile clients, improper handling of MongoDB data in LLM workflows can expose information that should remain on the server or within secure contexts.
Compliance mappings are relevant here as well. Findings from such leakage scenarios typically map to OWASP API Top 10 (API1:2023 Broken Object Level Authorization when data is over-exposed), GDPR data minimization and purpose limitation, and SOC2 controls around information disclosure. middleBrick’s per-category breakdowns help identify whether a specific MongoDB query or endpoint is contributing to LLM data leakage, providing prioritized findings with severity and remediation guidance.
Mongodb-Specific Remediation in Chi — concrete code fixes
To mitigate LLM data leakage in a Chi application using MongoDB, focus on ensuring that only necessary, sanitized data is ever presented to the LLM and that sensitive fields are never logged or echoed in responses. Below are concrete patterns and code examples tailored for Chi and MongoDB.
1. Project only required fields from MongoDB documents
When querying MongoDB, use projection to return only the fields needed for business logic and LLM interaction. Avoid returning entire documents.
import reactivemongo.api.bson._
import reactivemongo.play.json.ImplicitBSONHandlers._
import play.api.libs.json._
case class PublicUser(id: String, name: String)
val publicUserReader: BSONDocumentReader[PublicUser] = BSONDocumentReader { doc =>
PublicUser(
id = doc.getAsOpt[BSONString]("_id").map(_.value).getOrElse(""),
name = doc.getAsOpt[BSONString]("name").map(_.value).getOrElse("")
)
}
// In a Chi route, fetch only public fields
val safeProjection = BSONDocument("_id" → 1, "name" → 1)
val cursor: Cursor[PublicUser] = collection.find(BSONDocument(), Some(safeProjection)).cursor[PublicUser]()
2. Sanitize data before LLM consumption
Never pass raw MongoDB documents into LLM prompts. Build prompt contexts explicitly and remove or mask sensitive keys.
def buildPrompt(user: PublicUser, context: String): String =
s"""System: You are a helpful assistant.
|User: id=${user.id}, name=${user.name}, context=$context""".stripMargin
// Example of removing sensitive fields before any logging
val sanitizedLog = Json.obj(
"user_id" → user.id,
"context_length" → context.length
) // Do not include PII in logs
3. Validate and escape outputs that may contain stored data
If your LLM is expected to reference data from MongoDB (e.g., summarizing stored records), validate and escape outputs to prevent injection of unintended content.
import cats.data.Validated._
import cats.data.NonEmptyList
def validateLlmOutput(output: String): Either[NonEmptyList[String], String] =
if (output.contains("--") || output.contains("{\"")) {
Left(NonEmptyList.one("Output contains suspicious patterns"))
} else {
Right(output)
}
// Use validated output in Chi routes safely
chi.get { _ =>
Ok(validateLlmOutput(llmResponse).merge)
}
4. Avoid logging full LLM responses that may echo MongoDB content
Configure structured logging to exclude fields that may contain sensitive data originating from MongoDB.
// In application.conf or via code, ensure logback excludes sensitive keys
// Example logback.xml snippet:
// <configuration>
// <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
// <encoder>
// <pattern>%msg%n</pattern>
// </encoder>
// </appender>
// <logger name="llm.output" level="INFO" additivity="false">
// <filter class="ch.qos.logback.classic.filter.EvaluatorFilter">
//
5. Use middleware to enforce data boundaries
Implement request/response middleware in Chi to strip or redact sensitive MongoDB fields before they reach the LLM or are returned to the client.
import cats.effect._
import org.http4s._
import org.http4s.dsl.io._
def redactMongoFieldsMiddleware(service: HttpApp[IO]): HttpApp[IO] = { req =>
service.run(req).map { resp =>
resp.copy(entity = resp.entity.mapBy { chunk =>
// Example: remove keys named "email" or "apiKey" from JSON responses
val json = chunk.decodeString
json.replaceAll("\"email\"\\s*:\\\"[^\"]*\"", "\"email\": \"[REDACTED]\"")
})
}
}Related CWEs: llmSecurity
| CWE ID | Name | Severity |
|---|---|---|
| CWE-754 | Improper Check for Unusual or Exceptional Conditions | MEDIUM |