HIGH path traversaldjangofirestore

Path Traversal in Django with Firestore

Path Traversal in Django with Firestore — how this specific combination creates or exposes the vulnerability

Path Traversal occurs when user-controlled input is used to construct file or resource paths without proper validation, allowing an attacker to access files outside the intended directory. In a Django application that uses Google Cloud Firestore as a backend, the risk typically does not involve traditional filesystem traversal because Firestore is a managed NoSQL database and does not expose a POSIX filesystem. Instead, the vulnerability manifests in how application code translates HTTP request parameters into Firestore document paths or collection/entity identifiers.

Django applications often build references to Firestore entities using values from the request, such as document IDs or collection names supplied via URL path segments or query parameters. If these values are used naively to construct document references (for example, by string interpolation), an attacker can supply path-like inputs such as ../../secrets/config or reserved names to probe for sensitive data. Although Firestore itself enforces access controls via Security Rules and IAM, the application layer may inadvertently expose more data than intended by resolving unsafe paths into queries or document reads.

Consider a view that retrieves a user profile by ID directly from the URL without validation:

from google.cloud import firestore
from django.http import JsonResponse
def get_profile(request, profile_id):
    db = firestore.Client()
    doc_ref = db.collection('profiles').document(profile_id)
    doc = doc_ref.get()
    if doc.exists:
        return JsonResponse(doc.to_dict())
    return JsonResponse({'error': 'Not found'}, status=404)

If profile_id is taken directly from the request and not validated, an attacker could attempt values like ../private_profile or include encoded slashes depending on URL routing. While Firestore treats the document ID as a single opaque key, the surrounding application logic might concatenate IDs into collection paths or use the value to select which collection to query, increasing the risk of exposing data across logical boundaries. Additionally, if the application uses the same parameter to determine both collection and document, an input such as users/../../../admin_users could shift the intended query scope when the code does not rigorously sanitize or whitelist inputs.

Another scenario involves dynamic collection names derived from user input, such as tenant identifiers in a multi-tenant application. Without strict validation, an attacker might attempt directory traversal patterns to reference collections outside the tenant’s namespace. Because Firestore stores data hierarchically (collections and documents), an unsafe mapping from URL segments to collection names can lead to unauthorized reads if the application does not enforce strict naming conventions and access rules.

Even though Firestore Security Rules can mitigate some risks by enforcing read and write permissions, the application must ensure that rule evaluation does not rely on untrusted input to construct paths. A misconfigured rule combined with unsafe Django code can allow an authenticated user to read documents they should not access if the document ID is derived directly from manipulated request data.

Firestore-Specific Remediation in Django — concrete code fixes

To prevent Path Traversal risks when Django interacts with Firestore, validate and sanitize all user input used to construct document references or collection names. Use allowlists, strict type checks, and avoid directly concatenating user input into paths. The following examples demonstrate secure patterns.

1. Validate document IDs against an allowlist or regex

Restrict document IDs to alphanumeric characters and safe symbols. Reject any input containing ., /, or .. to prevent path-like manipulation:

import re
from django.http import JsonResponse
def is_valid_document_id(doc_id):
    pattern = r'^[a-zA-Z0-9_-]{1,100}$'
    return re.match(pattern, doc_id) is not None
def get_profile_safe(request, profile_id):
    if not is_valid_document_id(profile_id):
        return JsonResponse({'error': 'Invalid profile ID'}, status=400)
    db = firestore.Client()
    doc_ref = db.collection('profiles').document(profile_id)
    doc = doc_ref.get()
    if doc.exists:
        return JsonResponse(doc.to_dict())
    return JsonResponse({'error': 'Not found'}, status=404)

2. Use a mapping for multi-tenant collections instead of raw user input

Avoid using raw tenant identifiers as collection names. Map tenant identifiers to predefined, validated collection names:

TENANT_COLLECTIONS = {
    'acme': 'tenant_acme_data',
    'globex': 'tenant_globex_data',
}
def get_tenant_data(request, tenant_slug, doc_id):
    if tenant_slug not in TENANT_COLLECTIONS:
        return JsonResponse({'error': 'Invalid tenant'}, status=403)
    if not is_valid_document_id(doc_id):
        return JsonResponse({'error': 'Invalid document ID'}, status=400)
    collection_name = TENANT_COLLECTIONS[tenant_slug]
    db = firestore.Client()
    doc_ref = db.collection(collection_name).document(doc_id)
    doc = doc_ref.get()
    if doc.exists:
        return JsonResponse(doc.to_dict())
    return JsonResponse({'error': 'Not found'}, status=404)

3. Avoid dynamic collection selection from user input

If your application must work with dynamic collections, enforce a strict whitelist and never concatenate user input directly into collection paths. Instead, use a controlled mapping and parameterized queries:

SAFE_COLLECTIONS = ['public_posts', 'public_events']
def get_public_item(request, collection_name, item_id):
    if collection_name not in SAFE_COLLECTIONS:
        return JsonResponse({'error': 'Forbidden collection'}, status=403)
    if not is_valid_document_id(item_id):
        return JsonResponse({'error': 'Invalid item ID'}, status=400)
    db = firestore.Client()
    doc_ref = db.collection(collection_name).document(item_id)
    doc = doc_ref.get()
    if doc.exists:
        return JsonResponse(doc.to_dict())
    return JsonResponse({'error': 'Not found'}, status=404)

4. Enforce Firestore Security Rules and validate server-side

While Django should validate inputs, complement this with Firestore Security Rules that restrict reads to allowed collections and documents. Ensure rules do not rely on request-derived path components that can be spoofed. In Django, always treat Firestore responses as untrusted and validate data shapes before use.

5. Use Firestore’s built-in document ID generation

When creating new documents, let Firestore generate IDs instead of accepting client-supplied IDs that may contain traversal patterns:

def create_profile(request):
    data = request.POST.dict()
    db = firestore.Client()
    doc_ref = db.collection('profiles').document()  # Auto-generated ID
    doc_ref.set(data)
    return JsonResponse({'id': doc_ref.id}, status=201)

Related CWEs: inputValidation

CWE IDNameSeverity
CWE-20Improper Input Validation HIGH
CWE-22Path Traversal HIGH
CWE-74Injection CRITICAL
CWE-77Command Injection CRITICAL
CWE-78OS Command Injection CRITICAL
CWE-79Cross-site Scripting (XSS) HIGH
CWE-89SQL Injection CRITICAL
CWE-90LDAP Injection HIGH
CWE-91XML Injection HIGH
CWE-94Code Injection CRITICAL

Frequently Asked Questions

Can Path Traversal occur with Firestore if the database has no filesystem?
Yes. In Django, Path Traversal with Firestore emerges from unsafe mapping of user input to document paths, collection names, or query scopes, not from filesystem access. Attackers can exploit poor input validation to access unintended data segments.
Does Firestore Security Rules alone prevent Path Traversal in Django?
No. Rules provide an important layer, but the application must not construct document references from untrusted input. Server-side validation in Django, combined with well-defined rules, is required to reliably prevent traversal-based data exposure.