Llm Data Leakage in Fiber with Firestore
Llm Data Leakage in Fiber with Firestore — how this specific combination creates or exposes the vulnerability
Llm Data Leakage occurs when an application unintentionally exposes sensitive information through responses generated by or processed by an LLM. In a Fiber application that integrates Firestore as the backend data store, the risk arises when request handling pipelines pass raw or insufficiently-sanitized data—such as user records, API keys, or internal identifiers—into LLM interactions. Firestore documents often contain fields that may include personally identifiable information (PII), service account tokens, or operational metadata; if these documents are forwarded to an LLM endpoint without careful filtering, the LLM may include that data verbatim in its output.
In this context, the LLM/AI Security checks provided by middleBrick are particularly relevant because they specifically test for System Prompt Leakage, PII/key/code in LLM outputs, and detection of unauthenticated LLM endpoints. When Fiber routes constructed from Firestore data reach an LLM, the following chain can occur:
- An API endpoint in Fiber retrieves documents from Firestore, potentially including fields such as
uid,email, orinternalId. - These fields are interpolated into prompts, either directly or via intermediate objects that are serialized into the prompt text or passed as tool input.
- An LLM processes the prompt and, depending on the model and configuration, may reflect the sensitive content in completions, function call arguments, or tool call metadata.
- If the LLM endpoint is unauthenticated or improperly scoped, or if output scanning is not enforced, the leaked data can be exfiltrated through chat responses or logs accessible to downstream clients.
middleBrick’s detection for System Prompt Leakage uses 27 regex patterns tailored to ChatML, Llama 2, Mistral, and Alpaca formats; this helps identify whether Firestore-derived content has been embedded into system or user messages. Its Active Prompt Injection testing performs five sequential probes—including system prompt extraction and data exfiltration—that can reveal whether a Fiber endpoint inadvertently channels Firestore data into prompts that an LLM can be tricked into revealing. Output scanning checks for PII, API keys, and executable code in LLM responses, which is essential when Firestore documents contain sensitive fields. Because this scanner runs without authentication (black-box style), it can surface issues where endpoints expose Firestore-backed data to LLMs without adequate sanitization or authorization controls.
Additionally, Excessive Agency detection looks for patterns such as tool_calls, function_call, or LangChain agent structures; if Fiber applications forward Firestore documents into tool-calling workflows without strict schema validation, the LLM may request or produce operations that expose further sensitive records. Unauthenticated LLM endpoint detection is also critical: if a Fiber route invokes an LLM without requiring tokens or mutual TLS, any party able to reach the route can trigger leakage through crafted inputs that pull Firestore content into prompts.
To illustrate, consider a route like /users/:id/profile in Fiber that reads a Firestore document and builds a user context for an LLM:
const userDoc = await db.collection('users').doc(req.params.id).get();
const context = {
name: userDoc.get('name'),
email: userDoc.get('email'),
preferences: userDoc.get('preferences')
};
const prompt = `Help user ${context.name} (${context.email}) with preferences: ${context.preferences}`;
If this prompt is sent to an LLM without filtering or redaction, the email and name may appear in the model output, especially during debugging or verbose modes. middleBrick’s checks would flag this by identifying PII in outputs and by testing whether the system or user prompts contain raw Firestore fields. The remediation is to avoid passing raw sensitive fields into prompts, apply strict output scanning, and enforce authentication on any LLM endpoints used by the Fiber application.
Firestore-Specific Remediation in Fiber — concrete code fixes
Remediation focuses on minimizing data exposure to the LLM and ensuring that only necessary, sanitized data is used in prompts. In Fiber, you should explicitly select safe fields, redact or omit PII, and enforce authentication on LLM calls. Below are concrete, Firestore-aware code examples that demonstrate secure patterns.
1. Select only non-sensitive fields and use parameterized prompts
Instead of passing entire Firestore documents, pick only the fields required for the operation and avoid including emails, UIDs, or internal IDs in prompts.
const userDoc = await db.collection('users').doc(req.params.id).get();
const safeData = {
userId: userDoc.id,
tier: userDoc.get('tier'),
lastActive: userDoc.get('lastActive')
};
const prompt = `Assist user with tier ${safeData.tier} and last activity on ${safeData.lastActive}.`;
2. Use allowlists for Firestore document fields
Define an allowlist of field names and project the document to that subset before constructing prompts. This prevents accidental inclusion of new fields that may contain PII.
const ALLOWED_FIELDS = ['tier', 'lastActive', 'region'];
const userDoc = await db.collection('users').doc(req.params.id).get();
const projected = {};
ALLOWED_FIELDS.forEach(field => {
if (userDoc.exists && userDoc.data()[field] !== undefined) {
projected[field] = userDoc.data()[field];
}
});
const prompt = `Process data for: ${JSON.stringify(projected)}`;
3. Enforce authentication and scope control for LLM endpoints
Ensure that any route invoking an LLM validates sessions or API tokens. In Fiber, you can use middleware to check for a valid Authorization header before forwarding requests to the LLM.
app.all('/api/llm/*', (req, res, next) => {
const auth = req.headers.authorization;
if (!auth || !isValidToken(auth)) {
return res.status(401).json({ error: 'unauthorized' });
}
next();
});
4. Apply output scanning and normalization before returning responses
Treat LLM outputs as untrusted. Inspect responses for patterns resembling PII or keys, and remove or mask them before sending to the client. While middleBrick’s output scanning identifies issues, implement basic checks in code as a first line of defense.
function sanitizeOutput(text) {
// Simple redaction pattern for emails
return text.replace(/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g, '[REDACTED]');
}
const llmResponse = await callLLM(prompt);
const safeResponse = sanitizeOutput(llmResponse.text);
res.json({ response: safeResponse });
5. Secure Firestore rules and minimize token scope
Although this is backend guidance, ensure Firestore security rules limit read access to only necessary fields for the service account used by Fiber. Avoid broad read permissions and prefer granular rules that align with the data required by the API.
By combining these patterns—field selection, allowlists, authenticated LLM endpoints, and output sanitization—you reduce the likelihood that Firestore data will leak into LLM interactions. middleBrick’s checks support this posture by detecting system prompt leakage, unauthenticated LLM endpoints, and PII in outputs, helping you validate that your Fiber + Firestore integration remains within secure boundaries.
Related CWEs: llmSecurity
| CWE ID | Name | Severity |
|---|---|---|
| CWE-754 | Improper Check for Unusual or Exceptional Conditions | MEDIUM |