HIGH llm data leakagegorilla muxmongodb

Llm Data Leakage in Gorilla Mux with Mongodb

Llm Data Leakage in Gorilla Mux with Mongodb — how this specific combination creates or exposes the vulnerability

LLM data leakage in a Gorilla Mux service that uses MongoDB as a backend can occur when application code unintentionally exposes sensitive data through HTTP handlers registered with the router. Gorilla Mux is a request multiplexer and matcher library; it does not enforce data handling policies, so developers must ensure that route handlers implement proper access controls and output filtering. If handlers query MongoDB without field-level restrictions and then forward raw or overly broad responses to an LLM endpoint or expose them in logs, prompts, or error messages, confidential information can be included in LLM outputs or captured via injection probes.

For example, a handler that retrieves user records from MongoDB and passes the full document to an LLM client for summarization risks leaking PII, API keys, or internal identifiers. This becomes critical when the same service exposes an unauthenticated LLM endpoint or uses the MongoDB query results directly in prompts, as the LLM may reflect sensitive data in completions or error traces. Attackers can probe these endpoints with prompt injection techniques to extract training data, infer schema details, or retrieve credentials embedded in document values. Because Gorilla Mux routes often consolidate multiple business functions, a single overly permissive route can expose aggregated data that would otherwise be segmented across microservices.

Compounding the risk, MongoDB documents may contain nested fields, arrays, and metadata that are not intended for external consumption. If the handler does not explicitly project the response to include only safe fields, the LLM interaction surface grows. Combined with insufficient input validation on route parameters, this can lead to path traversal or NoSQL injection that modifies queries and returns unintended records. The LLM then processes these records, increasing the chance that sensitive content appears in outputs or logs. middleBrick’s LLM/AI Security checks detect such leakage by scanning for system prompt exposure, active prompt injection attempts, and PII in model responses, highlighting where route logic and MongoDB access patterns need tightening.

Mongodb-Specific Remediation in Gorilla Mux — concrete code fixes

Mitigate LLM data leakage by controlling what data leaves MongoDB and reaches Gorilla Mux handlers. Use projection to return only required fields, apply strict allow-lists for query parameters, and avoid passing raw database documents directly into LLM prompts. The following examples assume a Go service using go.mongodb.org/mongo-driver/mongo and github.com/gorilla/mux.

1. Field-level projection to limit exposed data

Explicitly define which fields are returned from MongoDB. This prevents nested PII or internal metadata from reaching the LLM or being logged.

import (  
    "context"
  "go.mongodb.org/mongo-driver/bson"
  "go.mongodb.org/mongo-driver/mongo"
  "go.mongodb.org/mongo-driver/mongo/options"
  "net/http"
  mux "github.com/gorilla/mux"
)

func getUserProfile(w http.ResponseWriter, r *http.Request) {
  vars := mux.Vars(r)
  userID := vars["id"]
  ctx := r.Context()

  collection := client.Database("appdb").Collection("users")
  filter := bson.D{{"_id", userID}}
  projection := bson.D{
    {"email", 1},
    {"name", 1},
    {"role", 1},
    {"_id", 1},
    {"password_hash", 0},
    {"api_key", 0},
    {"internal_notes", 0},
  }
  opts := options.FindOne().SetProjection(projection)
  var user bson.M
  if err := collection.FindOne(ctx, filter, opts).Decode(&user); err != nil {
    http.Error(w, "user not found", http.StatusNotFound)
    return
  }
  w.Header().Set("Content-Type", "application/json")
  json.NewEncoder(w).Encode(user)
}

2. Allow-list route parameters and sanitize inputs

Validate path and query parameters to prevent NoSQL injection that could alter MongoDB queries and return excessive data.

func getProduct(w http.ResponseWriter, r *http.Request) {
  vars := mux.Vars(r)
  category := vars["category"]
  // Allow-list validation
  allowed := map[string]bool{"books": true, "electronics": true, "clothing": true}
  if !allowed[category] {
    http.Error(w, "invalid category", http.StatusBadRequest)
    return
  }
  collection := client.Database("store").Collection("products")
  cursor, err := collection.Find(r.Context(), bson.D{{"category", category}})
  if err != nil {
    http.Error(w, "search failed", http.StatusInternalServerError)
    return
  }
  defer cursor.Close(r.Context())
  var results []bson.M
  if err = cursor.All(r.Context(), &results); err != nil {
    http.Error(w, "failed to decode", http.StatusInternalServerError)
    return
  }
  w.Header().Set("Content-Type", "application/json")
  json.NewEncoder(w).Encode(results)
}

3. Avoid exposing raw documents to LLM prompts

If you must send data to an LLM, transform and sanitize it first. Remove keys and identifiers, and use structured summaries instead of raw documents.

type SafeSummary struct {
  UserName string `json:"user_name"`
  Actions []string `json:"recent_actions"`
}

func buildSafeSummary(user bson.M) SafeSummary {
  actions, _ := user["recent_actions"].([]interface{})
  s := SafeSummary{
    UserName: user["name"].(string),
    Actions: make([]string, 0, len(actions)),
  }
  for _, a := range actions {
    s.Actions = append(s.Actions, a.(string))
  }
  return s
}

func handler(w http.ResponseWriter, r *http.Request) {
  vars := mux.Vars(r)
  userID := vars["id"]
  ctx := r.Context()
  collection := client.Database("appdb").Collection("users")
  var raw bson.M
  collection.FindOne(ctx, bson.D{{"_id", userID}}).Decode(&raw)
  summary := buildSafeSummary(raw)
  // Pass summary to LLM client instead of raw user document
  prompt := "Summarize recent activity for user: " + summary.UserName
  // llmClient.ChatCompletion(ctx, prompt) — use sanitized prompt
  w.Write([]byte(prompt))
}

4. Error handling and logging hygiene

Ensure error messages do not include stack traces or database details that could aid attackers. Avoid logging full query responses that may contain sensitive data.

func safeQuery(col *mongo.Collection, ctx context.Context, filter any) (*mongo.cursor, error) {
  cursor, err := col.Find(ctx, filter)
  if err != nil {
    // Log minimal metadata, not the filter or returned documents
    log.Printf("query failed: %v", err)
    return nil, err
  }
  return cursor, nil
}

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

How can I detect if my Gorilla Mux + MongoDB service is leaking data to LLMs?
Use automated scanning that includes LLM/AI Security checks, such as active prompt injection probes and PII detection in model responses. Instrument handlers to ensure raw MongoDB documents are not passed into LLM prompts, and review logs for sensitive content in completions.
Does middleware in Gorilla Mux help prevent LLM data leakage?
Middleware can enforce authentication, input validation, and response rewriting to strip sensitive fields before data reaches handlers. Combine middleware with field-level projections and allow-list checks to reduce the attack surface presented to LLMs.