HIGH llm data leakageaspnetmongodb

Llm Data Leakage in Aspnet with Mongodb

Llm Data Leakage in Aspnet with Mongodb — how this specific combination creates or exposes the vulnerability

When an ASP.NET application uses MongoDB as its primary data store and exposes endpoints that interact with language model (LLM) features, the combination can unintentionally leak sensitive data through model inputs, prompts, or responses. LLM data leakage in this context refers to scenarios where private information—such as user identifiers, connection strings, or business data—travels into LLM calls either via constructed prompts or through tool usage patterns that the model can expose via its outputs.

In an ASP.NET + MongoDB stack, data often flows from the database into application objects, which are then serialized into prompts or used to construct dynamic instructions. If these objects contain sensitive fields and are passed into an LLM endpoint without careful filtering or redaction, the data may be surfaced in model responses. For example, embedding a MongoDB document containing a user’s email, internal notes, or API keys into a system prompt can lead to system prompt leakage, where the model reveals that information either through direct repetition or via crafted extraction probes.

The risk is compounded when the application uses unauthenticated or poorly scoped endpoints, as an attacker may probe the API to discover which data elements are being sent to the LLM. Because MongoDB documents can include nested fields and metadata (such as _id, timestamps, or version keys), these elements may inadvertently become part of the prompt context. Attack patterns like prompt injection or cost exploitation can then coax the model into returning data that should remain confidential, effectively turning the LLM channel into an unintended data exfiltration path.

Moreover, if the application uses tool calls or function calling features to let the LLM interact with backend logic (for instance, to query MongoDB), the schema of those tools—field names, expected parameters, and usage patterns—can leak structural information about the data layer. When combined with system prompt leakage detection, which uses regex patterns tailored to ChatML, Llama 2, Mistral, and Alpaca formats, middleBrick can identify whether sensitive data elements are present in prompts that follow these conventions. This is especially relevant when the ASP.NET app serializes MongoDB Bson documents into JSON for inclusion in prompts, as field names and values may align with known leakage patterns.

Continuous monitoring through the Pro plan can help detect recurring exposure by scanning these integrations over time. The CLI tool allows developers to script scans against specific endpoints, while the GitHub Action can enforce security gates before deployment. Because the scanner tests the unauthenticated attack surface, it can surface risky endpoint behaviors—such as verbose error messages or reflection of submitted data in LLM responses—that indicate potential leakage channels without requiring access credentials.

Mongodb-Specific Remediation in Aspnet — concrete code fixes

To mitigate LLM data leakage when using MongoDB in ASP.NET, focus on strict data segregation, field-level filtering, and safe prompt construction. Avoid passing entire MongoDB documents into LLM prompts. Instead, explicitly select only the necessary, non-sensitive fields and sanitize values before inclusion.

Use projection to limit the fields retrieved from MongoDB. For example, when querying user profiles, retrieve only display-related fields and exclude internal or sensitive keys:

var filter = Builders<BsonDocument>.Filter.Eq("userId", userId);
var projection = Builders<BsonDocument>.Projection
    .Include("displayName")
    .Include("preferences.theme")
    .Exclude("_id")
    .Exclude("email")
    .Exclude("apiKey");
var userDoc = collection.Find(filter).Project(projection).FirstOrDefault();

When constructing prompts, use a dedicated DTO that omits sensitive properties rather than serializing the BsonDocument directly:

public class PromptUserProfile
{
    public string DisplayName { get; set; }
    public string Theme { get; set; }
}

var safeProfile = new PromptUserProfile
{
    DisplayName = userDoc["displayName"].AsString,
    Theme = userDoc["preferences"].AsBsonDocument["theme"].AsString
};

// Use safeProfile in prompt construction
var prompt = $"User {safeProfile.DisplayName} prefers {safeProfile.Theme} theme.";

If the application employs tool calls or function calling, define schemas that avoid exposing internal field names. Instead of using raw MongoDB field names as tool parameters, map them to abstract names and transform values server-side before invoking the LLM:

var toolDefinition = new
{
    type = "function",
    function = new
    {
        name = "get_user_preferences",
        description = "Retrieves non-sensitive user UI preferences",
        parameters = new
        {
            type = "object",
            properties = new
            {
                theme = new { type = "string" },
                language = new { type = "string" }
            },
            required = new[] { "theme" }
        }
    }
};

Additionally, implement input validation and output scanning within the ASP.NET layer to ensure that data sent to the LLM does not contain PII or secrets. Use regex-based checks to detect patterns like email addresses or connection strings before they are embedded in prompts. Combine this with runtime monitoring via the middleBrick dashboard to track how often sensitive fields appear in constructed prompts across different endpoints.

For persistent protection, integrate the middleBrick CLI into development workflows to automate scans against key endpoints. The GitHub Action can fail builds when high-risk fields are detected in prompt templates, while the MCP Server enables real-time review within IDEs. These integrations support proactive remediation without relying on the scanner to fix issues automatically.

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

How can I prevent MongoDB field names from leaking into LLM prompts in ASP.NET?
Use explicit field projection and DTOs to include only safe fields, and avoid serializing BsonDocument directly into prompts.
Does middleBrick automatically block data leakage to LLMs in ASP.NET apps?
middleBrick detects and reports potential leakage patterns; it does not block or fix issues. Apply the remediation guidance to reduce exposure.