Excessive Data Exposure in Firestore
How Excessive Data Exposure Manifests in Firestore
Firestore's document-oriented structure and flexible querying create unique opportunities for excessive data exposure. Unlike traditional SQL databases where you explicitly SELECT columns, Firestore's default behavior of returning entire documents can inadvertently expose sensitive information.
The most common Firestore-specific pattern involves fetching entire documents when only specific fields are needed. Consider this vulnerable pattern:
const userDoc = await db.collection('users').doc(userId).get();
const userData = userDoc.data();
// userData now contains ALL fields, including sensitive ones like passwordHash, SSN, or API keysFirestore's real-time listeners compound this problem. When developers attach listeners without proper field filtering, clients continuously receive updates containing all document fields:
// Vulnerable: receives all fields in real-time
db.collection('users').doc(userId).onSnapshot((doc) => {
const data = doc.data();
// All fields, including sensitive data, pushed to client
Collection queries without field selection create another vector. Firestore queries return entire documents matching criteria, which can include sensitive fields across multiple documents:
// Vulnerable: returns all fields for all matching users
const users = await db.collection('users')
.where('role', '==', 'admin')
.get();
users.forEach(doc => {
const data = doc.data();
// Contains sensitive admin-specific fields
Firestore's security rules can also contribute to excessive exposure. Overly permissive rules that grant read access to entire documents, combined with client-side field filtering, create a false sense of security:
// Vulnerable security rule - grants full read access
allow read: if request.auth != null;Firestore's document subcollections present unique challenges. When fetching parent documents with subcollections, developers often unknowingly expose nested sensitive data:
// Vulnerable: exposes entire document structure
const doc = await db.collection('users').doc(userId).get({
includeMetadataChanges: true
Firestore-Specific Detection
Detecting excessive data exposure in Firestore requires examining both client-side code patterns and security rules. The most effective approach combines static analysis with runtime scanning.
Client-side detection focuses on identifying Firestore operations that retrieve entire documents without field filtering. Look for these patterns:
// Vulnerable patterns to flag
db.collection('collection').doc('docId').get() // No field selection
db.collection('collection').where('field', '==', value).get() // No field projection
db.collection('collection').onSnapshot() // Real-time without filteringmiddleBrick's Firestore-specific scanner analyzes these patterns automatically. It examines your API endpoints that interact with Firestore, identifying calls that retrieve entire documents when only specific fields are needed. The scanner tests unauthenticated endpoints to reveal what data is exposed without proper authentication.
Security rule analysis is equally important. middleBrick evaluates your Firestore security rules to identify overly permissive read access:
// Rules that grant excessive access
allow read: if true; // Anyone can read
allow read: if request.auth != null; // Any authenticated user
allow read: if request.auth.token.role == 'admin'; // Role-based but still broadRuntime scanning with middleBrick tests actual Firestore endpoints by sending requests and analyzing the data returned. This black-box approach reveals what an attacker could access without credentials, testing the unauthenticated attack surface.
Field-level analysis tools can examine Firestore documents to identify sensitive fields that are frequently exposed. This includes PII, authentication credentials, API keys, and internal system data that should be protected.
middleBrick's LLM/AI security scanning also detects if your Firestore endpoints are exposed to AI models, checking for system prompt leakage and excessive data exposure to AI consumers.
Firestore-Specific Remediation
Firestore provides several native mechanisms to prevent excessive data exposure. The most fundamental is field projection using select() to retrieve only specific fields:
// Secure: only retrieve needed fields
const userData = await db.collection('users')
.doc(userId)
.get({
fieldMask: ['name', 'email', 'role'] // Only these fields
});For real-time listeners, apply the same field filtering:
// Secure: filtered real-time listener
db.collection('users').doc(userId).onSnapshot({
includeMetadataChanges: false
}, (doc) => {
const data = doc.data();
// Only contains fields the user is authorized to see
Firestore security rules should implement field-level access control. Use request.resource.data to restrict which fields can be read:
// Secure security rules with field-level access
allow read: if request.auth != null &&
request.auth.token.role == 'user' &&
request.resource.data.keys().hasAll(['name', 'email']) &&
!request.resource.data.keys().hasAny(['passwordHash', 'ssn', 'apiKey']);For collection queries, combine field projection with security rules:
// Secure query with field filtering
const users = await db.collection('users')
.where('status', '==', 'active')
.select('name', 'email', 'role') // Only return these fields
.get();Firestore's composite keys and denormalization can help reduce exposure. Instead of storing sensitive data in main documents, use separate collections with strict access controls:
// Secure pattern: separate sensitive data
// Public user profile
const publicProfile = await db.collection('userProfiles')
.doc(userId)
.select('name', 'email', 'avatar');
// Sensitive data in separate collection
const sensitiveData = await db.collection('userSecrets')
.doc(userId)
.get({
fieldMask: [] // Never expose directly
});middleBrick's CLI tool can help verify your remediation efforts:
# Scan your Firestore API endpoints
middlebrick scan https://your-app.com/api/users
# Check for excessive data exposure findings
middlebrick report --format=jsonIntegrate middleBrick into your CI/CD pipeline to catch excessive data exposure before deployment:
# GitHub Action workflow
jobs:
security-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run middleBrick scan
run: middlebrick scan https://staging.your-app.com/api --fail-below BRelated CWEs: propertyAuthorization
| CWE ID | Name | Severity |
|---|---|---|
| CWE-915 | Mass Assignment | HIGH |