HIGH llm data leakagefiberfirestore

Llm Data Leakage in Fiber with Firestore

Llm Data Leakage in Fiber with Firestore — how this specific combination creates or exposes the vulnerability

Llm Data Leakage occurs when an application unintentionally exposes sensitive information through responses generated by or processed by an LLM. In a Fiber application that integrates Firestore as the backend data store, the risk arises when request handling pipelines pass raw or insufficiently-sanitized data—such as user records, API keys, or internal identifiers—into LLM interactions. Firestore documents often contain fields that may include personally identifiable information (PII), service account tokens, or operational metadata; if these documents are forwarded to an LLM endpoint without careful filtering, the LLM may include that data verbatim in its output.

In this context, the LLM/AI Security checks provided by middleBrick are particularly relevant because they specifically test for System Prompt Leakage, PII/key/code in LLM outputs, and detection of unauthenticated LLM endpoints. When Fiber routes constructed from Firestore data reach an LLM, the following chain can occur:

  1. An API endpoint in Fiber retrieves documents from Firestore, potentially including fields such as uid, email, or internalId.
  2. These fields are interpolated into prompts, either directly or via intermediate objects that are serialized into the prompt text or passed as tool input.
  3. An LLM processes the prompt and, depending on the model and configuration, may reflect the sensitive content in completions, function call arguments, or tool call metadata.
  4. If the LLM endpoint is unauthenticated or improperly scoped, or if output scanning is not enforced, the leaked data can be exfiltrated through chat responses or logs accessible to downstream clients.

middleBrick’s detection for System Prompt Leakage uses 27 regex patterns tailored to ChatML, Llama 2, Mistral, and Alpaca formats; this helps identify whether Firestore-derived content has been embedded into system or user messages. Its Active Prompt Injection testing performs five sequential probes—including system prompt extraction and data exfiltration—that can reveal whether a Fiber endpoint inadvertently channels Firestore data into prompts that an LLM can be tricked into revealing. Output scanning checks for PII, API keys, and executable code in LLM responses, which is essential when Firestore documents contain sensitive fields. Because this scanner runs without authentication (black-box style), it can surface issues where endpoints expose Firestore-backed data to LLMs without adequate sanitization or authorization controls.

Additionally, Excessive Agency detection looks for patterns such as tool_calls, function_call, or LangChain agent structures; if Fiber applications forward Firestore documents into tool-calling workflows without strict schema validation, the LLM may request or produce operations that expose further sensitive records. Unauthenticated LLM endpoint detection is also critical: if a Fiber route invokes an LLM without requiring tokens or mutual TLS, any party able to reach the route can trigger leakage through crafted inputs that pull Firestore content into prompts.

To illustrate, consider a route like /users/:id/profile in Fiber that reads a Firestore document and builds a user context for an LLM:

const userDoc = await db.collection('users').doc(req.params.id).get();
const context = {
  name: userDoc.get('name'),
  email: userDoc.get('email'),
  preferences: userDoc.get('preferences')
};
const prompt = `Help user ${context.name} (${context.email}) with preferences: ${context.preferences}`;

If this prompt is sent to an LLM without filtering or redaction, the email and name may appear in the model output, especially during debugging or verbose modes. middleBrick’s checks would flag this by identifying PII in outputs and by testing whether the system or user prompts contain raw Firestore fields. The remediation is to avoid passing raw sensitive fields into prompts, apply strict output scanning, and enforce authentication on any LLM endpoints used by the Fiber application.

Firestore-Specific Remediation in Fiber — concrete code fixes

Remediation focuses on minimizing data exposure to the LLM and ensuring that only necessary, sanitized data is used in prompts. In Fiber, you should explicitly select safe fields, redact or omit PII, and enforce authentication on LLM calls. Below are concrete, Firestore-aware code examples that demonstrate secure patterns.

1. Select only non-sensitive fields and use parameterized prompts

Instead of passing entire Firestore documents, pick only the fields required for the operation and avoid including emails, UIDs, or internal IDs in prompts.

const userDoc = await db.collection('users').doc(req.params.id).get();
const safeData = {
  userId: userDoc.id,
  tier: userDoc.get('tier'),
  lastActive: userDoc.get('lastActive')
};
const prompt = `Assist user with tier ${safeData.tier} and last activity on ${safeData.lastActive}.`;

2. Use allowlists for Firestore document fields

Define an allowlist of field names and project the document to that subset before constructing prompts. This prevents accidental inclusion of new fields that may contain PII.

const ALLOWED_FIELDS = ['tier', 'lastActive', 'region'];
const userDoc = await db.collection('users').doc(req.params.id).get();
const projected = {};
ALLOWED_FIELDS.forEach(field => {
  if (userDoc.exists && userDoc.data()[field] !== undefined) {
    projected[field] = userDoc.data()[field];
  }
});
const prompt = `Process data for: ${JSON.stringify(projected)}`;

3. Enforce authentication and scope control for LLM endpoints

Ensure that any route invoking an LLM validates sessions or API tokens. In Fiber, you can use middleware to check for a valid Authorization header before forwarding requests to the LLM.

app.all('/api/llm/*', (req, res, next) => {
  const auth = req.headers.authorization;
  if (!auth || !isValidToken(auth)) {
    return res.status(401).json({ error: 'unauthorized' });
  }
  next();
});

4. Apply output scanning and normalization before returning responses

Treat LLM outputs as untrusted. Inspect responses for patterns resembling PII or keys, and remove or mask them before sending to the client. While middleBrick’s output scanning identifies issues, implement basic checks in code as a first line of defense.

function sanitizeOutput(text) {
  // Simple redaction pattern for emails
  return text.replace(/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g, '[REDACTED]');
}
const llmResponse = await callLLM(prompt);
const safeResponse = sanitizeOutput(llmResponse.text);
res.json({ response: safeResponse });

5. Secure Firestore rules and minimize token scope

Although this is backend guidance, ensure Firestore security rules limit read access to only necessary fields for the service account used by Fiber. Avoid broad read permissions and prefer granular rules that align with the data required by the API.

By combining these patterns—field selection, allowlists, authenticated LLM endpoints, and output sanitization—you reduce the likelihood that Firestore data will leak into LLM interactions. middleBrick’s checks support this posture by detecting system prompt leakage, unauthenticated LLM endpoints, and PII in outputs, helping you validate that your Fiber + Firestore integration remains within secure boundaries.

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

How can I verify that Firestore fields are not leaking into my LLM prompts in a Fiber app?
Use middleBrick’s LLM/AI Security checks, which include System Prompt Leakage detection (27 regex patterns) and Active Prompt Injection testing with data exfiltration probes. Additionally, instrument your Fiber routes to log prompt inputs and inspect them for PII before sending to the LLM; automated tests can assert that allowed fields only are present.
Does output scanning alone prevent data leakage, or do I need to change how I build prompts in Fiber?
Output scanning is important but not sufficient on its own. You must avoid placing sensitive Firestore fields into prompts in the first place by selecting only safe fields, using allowlists, and redacting data. Output scanning then acts as a safety net to catch accidental exposure before it reaches users.