HIGH cache poisoningdjangofirestore

Cache Poisoning in Django with Firestore

Cache Poisoning in Django with Firestore — how this specific combination creates or exposes the vulnerability

Cache poisoning in Django with Firestore occurs when untrusted or attacker-controlled data is written into the cache layer and subsequently served as trusted, leading to incorrect application behavior or information disclosure. Firestore, used as a backend cache or data source in Django, can reflect manipulated inputs if the application uses query parameters or document fields to build cache keys without proper validation or normalization.

Consider a Django view that reads a document ID from the request, fetches a Firestore document, and caches the result using a key derived directly from that ID:

from django.core.cache import cache
from google.cloud import firestore

db = firestore.Client()

def get_item(request):
    doc_id = request.GET.get('id')
    cache_key = f'item:{doc_id}'
    cached = cache.get(cache_key)
    if cached:
        return cached
    doc_ref = db.collection('items').document(doc_id)
    doc = doc_ref.get()
    if doc.exists:
        cache.set(cache_key, doc.to_dict(), timeout=300)
        return doc.to_dict()
    return {'error': 'not found'}

If the application does not validate or sanitize doc_id, an attacker can supply values such as ../../../etc/passwd or crafted strings that map to sensitive or unrelated documents. Because the cache key is built directly from the input, the poisoned key can evict or overwrite legitimate entries, or cause the application to serve data from unintended documents. This is a BOLA/IDOR pattern: the cache does not enforce tenant or ownership boundaries, and Firestore document reads reflect the attacker’s chosen key.

Another scenario involves query-based cache poisoning. If the Django app caches the result of a Firestore query that includes user-controlled ordering or filtering fields, an attacker can manipulate those parameters to change the cached result set. For example, a leaderboard cached by score and user-supplied page parameters could be poisoned to show incorrect rankings or to cause excessive reads against sensitive documents:

def get_leaderboard(request):
    direction = request.GET.get('dir', 'desc')
    if direction not in ('asc', 'desc'):
        direction = 'desc'
    cache_key = f'leaderboard:{direction}'
    cached = cache.get(cache_key)
    if cached:
        return cached
    query = db.collection('scores').order_by('score', direction=direction == 'desc')
    results = [doc.to_dict() for doc in query.stream()]
    cache.set(cache_key, results, timeout=60)
    return results

Here, the cache key includes the direction, but if the direction is used inconsistently across code paths or not enforced server-side, poisoned cache entries can persist. Additionally, Firestore’s lack of built-in cache invalidation means poisoned entries may remain until TTL expiry, amplifying the impact.

LLM/AI Security checks in middleBrick can detect unusual patterns where endpoints interact with external data stores in ways that may expose cacheable surfaces susceptible to poisoning, including unvalidated document IDs or query parameters used to construct cache keys.

Firestore-Specific Remediation in Django — concrete code fixes

Remediation focuses on input validation, canonical cache keys, and separation between cache identity and data identity. Always normalize and validate identifiers before using them in Firestore reads or cache operations. Use a mapping layer (e.g., a deterministic hash or UUID lookup) to decouple cache keys from raw user input.

First, validate and normalize the document identifier. Avoid using raw paths or filesystem-like traversals. Use a strict allowlist or pattern check, and convert to a canonical Firestore document ID:

import re
from django.core.cache import cache
from google.cloud import firestore

db = firestore.Client()

VALID_ID_PATTERN = re.compile(r'^[a-zA-Z0-9_-]{1,100}$')

def normalize_doc_id(doc_id):
    if not VALID_ID_PATTERN.match(doc_id):
        raise ValueError('invalid document identifier')
    return doc_id.strip()

def get_item_safe(request):
    raw_id = request.GET.get('id')
    try:
        doc_id = normalize_doc_id(raw_id)
    except ValueError:
        return {'error': 'bad request'}
    cache_key = f'item:v2:{doc_id}'
    cached = cache.get(cache_key)
    if cached:
        return cached
    doc_ref = db.collection('items').document(doc_id)
    doc = doc_ref.get()
    if doc.exists:
        cache.set(cache_key, doc.to_dict(), timeout=300)
        return doc.to_dict()
    return {'error': 'not found'}

Second, avoid query-based cache keys that include mutable parameters like sort direction. Instead, canonicalize the query or store precomputed results with stable keys. If caching query results is necessary, include only stable, non-attack-controlled parameters in the key:

def get_leaderboard_safe(request):
    # canonicalize: only one version of the leaderboard is cached
    cache_key = 'leaderboard:score:desc'
    cached = cache.get(cache_key)
    if cached:
        return cached
    # Enforce server-side ordering; do not allow client to control direction
    query = db.collection('scores').order_by('score', direction=firestore.Query.DESCENDING)
    results = [doc.to_dict() for doc in query.stream()]
    cache.set(cache_key, results, timeout=300)
    return results

Third, use Firestore’s built-in features where possible, such as transactions or batched reads, to reduce reliance on long-lived cached entries that could be poisoned. Combine cache validation with ownership checks to enforce BOLA protections even when serving cached data:

def get_item_with_ownership(request, user_id):
    raw_id = request.GET.get('id')
    try:
        doc_id = normalize_doc_id(raw_id)
    except ValueError:
        return {'error': 'bad request'}
    cache_key = f'item:owner:{user_id}:{doc_id}'
    cached = cache.get(cache_key)
    if cached:
        return cached
    doc_ref = db.collection('items').document(doc_id)
    doc = doc_ref.get()
    if doc.exists:
        data = doc.to_dict()
        # Ensure the item belongs to the requesting user before caching
        if data.get('owner_id') == user_id:
            cache.set(cache_key, data, timeout=300)
            return data
    return {'error': 'forbidden'}

These practices align with OWASP API Top 10 categories such as Broken Object Level Authorization and Excessive Data Exposure. middleBrick’s API security scans can help identify endpoints where cache keys are derived directly from unvalidated inputs or where Firestore queries lack proper parameter constraints.

Frequently Asked Questions

How does cache poisoning differ from traditional injection attacks in Django apps using Firestore?
Cache poisoning specifically targets the caching layer by manipulating cache keys or cached data so that malicious or incorrect data is served. Unlike direct injection, which may alter database writes, poisoning corrupts what users read from the cache. In Django with Firestore, this often arises from building cache keys directly from unchecked request inputs that map to document IDs or query parameters, whereas traditional injection might attempt to alter Firestore queries themselves.
Does middleBrick test for cache poisoning scenarios involving Firestore in Django applications?
middleBrick runs 12 parallel security checks, including Input Validation, Property Authorization, and BOLA/IDOR. While it does not execute exploit payloads that mutate caches, it identifies endpoints where unvalidated inputs influence cache keys or data exposure, which are indicators that could lead to cache poisoning. Findings include severity, descriptions, and remediation guidance to help you harden Django + Firestore integrations.