MEDIUM unicode normalizationexpressfirestore

Unicode Normalization in Express with Firestore

Unicode Normalization in Express with Firestore — how this specific combination creates or exposes the vulnerability

Unicode Normalization becomes a security concern in Express applications when user-controlled strings are used as Firestore document keys or field values without canonicalization. Firestore stores strings as UTF-8 and does not enforce a specific normalization form, so equivalent Unicode characters can map to multiple binary representations. For example, the character "é" can be represented as a single code point U+00E9 or as a decomposed sequence "e" + U+0301. If an Express route uses user input directly as a document ID or map key, an attacker can supply a visually identical but differently normalized string to bypass expected equality checks, enumeration guards, or access control logic that compares strings without normalization.

Consider an Express endpoint that retrieves a user profile using a document ID derived from a username:

app.get('/profile/:username', async (req, res) => {
  const username = req.params.username;
  const doc = await db.collection('profiles').doc(username).get();
  res.json(doc.data() || {});
});

If the legitimate user registered with the NFC-normalized username "josé" (U+00E9), an attacker could request "/profile/josé" (NFD: "e" + combining acute accent) and potentially match a different document or cause inconsistent behavior depending on how Firestore indexes and returns documents. Insecure deserialization or unsafe consumption patterns downstream may further amplify risks if the application trusts the raw string for authorization decisions without normalization.

The LLM/AI Security checks in middleBrick specifically test for system prompt leakage and prompt injection, but Unicode Normalization issues fall under Input Validation and Property Authorization. middleBrick runs 12 security checks in parallel, including Input Validation and Property Authorization, which can surface inconsistencies when canonical forms are not enforced. Since middleBrick scans the unauthenticated attack surface and supports OpenAPI/Swagger spec analysis with full $ref resolution, it can highlight endpoints where user-controlled path or query parameters map to Firestore identifiers without normalization, providing prioritized findings and remediation guidance on a 0–100 scoring scale with letter grades A–F.

In an Express + Firestore context, remediation centers on normalizing inputs before using them as keys or query constraints. Choose a normalization form—NFC is common for compatibility—and apply it consistently across storage, retrieval, and comparison logic. MiddleBrick’s free tier allows a few scans per month to help detect such issues early; the CLI tool (middlebrick scan <url>) can be integrated into development workflows to validate that endpoints handle Unicode safely before deployment.

Firestore-Specific Remediation in Express — concrete code fixes

To mitigate Unicode Normalization issues in Express with Firestore, normalize all user-supplied strings before using them in document paths, map keys, or query filters. Use a well-tested normalization library such as unorm or the built-in normalize API available in modern JavaScript runtimes. Apply the same normalization form at write time and read time to ensure identity checks succeed.

Example: Normalize before creating or fetching a Firestore document by ID:

const admin = require('firebase-admin');
const unorm = require('unorm');

admin.initializeApp();
const db = admin.firestore();

app.get('/profile/:username', async (req, res) => {
  const usernameRaw = req.params.username;
  const username = unorm.nfc(usernameRaw);
  const doc = await db.collection('profiles').doc(username).get();
  res.json(doc.data() || {});
});

app.post('/profile', async (req, res) => {
  const user = req.body;
  const username = unorm.nfc(user.username);
  await db.collection('profiles').doc(username).set({
    username,
    email: user.email,
    displayName: user.displayName
  });
  res.status(201).send('OK');
});

For query constraints, normalize the search term as well:

app.get('/search', async (req, res) => {
  const tagRaw = req.query.tag;
  const tag = unorm.nfc(tagRaw || '');
  const snapshot = await db.collection('posts')
    .where('tags', 'array-contains', tag)
    .get();
  const results = snapshot.docs.map(d => ({ id: d.id, ...d.data() }));
  res.json(results);
});

If your API accepts identifiers from headers or bodies (e.g., a Firestore document reference provided by the client), normalize before constructing the reference:

app.post('/update', async (req, res) => {
  const { docPath, updates } = req.body;
  // Assume docPath is like "profiles/josé"; normalize each segment
  const normalizedPath = docPath.split('/').map(seg => unorm.nfc(seg)).join('/');
  const docRef = db.doc(normalizedPath);
  await docRef.update(updates);
  res.send('Updated');
});

Consistency is key: use the same normalization form (NFC or NFD) across your entire stack, including Firestore indexes and any client-side code. Avoid relying on Firestore server-side behavior to implicitly handle equivalence. middleBrick’s continuous monitoring (available in the Pro plan) can help detect regressions by scanning endpoints on a configurable schedule and alerting via Slack or Teams when risky patterns reappear. For CI/CD integration, the GitHub Action can fail builds if a scan’s security score drops below your defined threshold, preventing vulnerable Unicode handling from reaching production.

Frequently Asked Questions

Why does Unicode Normalization matter when using Firestore document IDs in Express?
Firestore stores strings as UTF-8 and does not enforce a normalization form. Equivalent Unicode strings can have multiple binary representations (e.g., NFC vs NFD). If Express uses raw user input as a document ID or key without canonicalization, attackers can supply visually identical but differently normalized strings to bypass equality checks or access controls, leading to inconsistent reads or unauthorized access.
How can I verify my Express + Firestore integration handles Unicode safely?
Normalize inputs with a library such as unorm before using them as Firestore document IDs or in queries, and apply the same normalization form consistently across reads and writes. Use middleBrick’s CLI (middlebrick scan <url>) to test endpoints for input validation issues; the free tier supports a few scans per month, and the Pro plan provides continuous monitoring and CI/CD integration to catch regressions early.