MEDIUM unicode normalizationfeathersjsmongodb

Unicode Normalization in Feathersjs with Mongodb

Unicode Normalization in Feathersjs with Mongodb — how this specific combination creates or exposes the vulnerability

FeathersJS does not enforce a canonical Unicode form for user-supplied strings before using them in queries against a MongoDB backend. If an attacker can provide identifiers or search parameters that differ only by normalization form (e.g., composed vs. decomposed), the same logical string can have multiple binary representations. In MongoDB, string comparisons are binary-sensitive by default, so café (U+00E9) and café (U+0065 + combining acute) are treated as different values. This mismatch can bypass authorization checks, enable IDOR through inconsistent record matching, or cause insecure direct object references (BOLA/IDOR) when a lookup key is not normalized consistently across the stack.

Consider a Feathers service that retrieves user profiles by username. If the client sends a normalized NFC form while the database stores the raw, unnormalized input (NFD), a lookup may fail to match the intended record or, worse, match an unintended one if another record coincidentally matches the binary form. In an authorization context, this can lead to horizontal privilege escalation: User A might inadvertently access User B’s data because their logically identical identifiers compare differently at the byte level. The same issue can affect search filters, index usage, and uniqueness constraints, potentially allowing duplicate entries that differ only by normalization, which can complicate integrity checks and lead to data confusion or leakage.

LLM-related endpoints that accept free-form text are also at risk. An attacker could supply prompt-like inputs with varied Unicode composition to probe normalization handling, attempting to bypass content filters or trigger unexpected behavior in downstream processing. Because FeathersJS typically passes query parameters directly to MongoDB queries, inconsistent normalization can propagate into aggregation pipelines or index lookups, creating subtle discrepancies between runtime behavior and expected policy. These gaps are hard to detect without explicit canonicalization, as the framework does not automatically normalize strings before invoking service methods or building query objects.

Mongodb-Specific Remediation in Feathersjs — concrete code fixes

To mitigate Unicode normalization issues, normalize incoming string identifiers to a consistent form (NFC is typical) before any database operation. Use a well-tested library such as unorm or the built-in String.prototype.normalize in Node.js, and apply normalization at the service layer so all paths through your FeathersJS application are covered. The following examples show a Feathers service that normalizes an id parameter before performing a MongoDB find by _id, and a hook that normalizes a searchable text field before query construction.

// utils/unicode.js
export const normalizeNFC = (value) => {
  if (typeof value !== 'string') return value;
  return value.normalize('NFC');
};

// services/users/users.class.js
import { Service } from '@feathersjs/feathers';
import { normalizeNFC } from '../utils/unicode';

export class Users extends Service {
  async find(params) {
    // Normalize an incoming ID if present
    if (params.query && params.query._id) {
      params.query._id = normalizeNFC(String(params.query._id));
    }
    return super.find(params);
  }

  async get(id, params) {
    const normalizedId = normalizeNFC(String(id));
    const record = await super.get(normalizedId, params);
    return record;
  }
}

For text fields that must remain searchable and index-friendly, normalize before building query objects. This ensures that a search for café matches stored data regardless of the client’s chosen Unicode form. If you rely on MongoDB collation for locale-aware comparisons, note that collation does not change binary representation for equality checks; normalization remains necessary for consistent matching across operations.

// services/search/search.class.js
import { Service } from '@feathersjs/feathers';
import { normalizeNFC } from '../utils/unicode';

export class SearchService extends Service {
  async find(params) {
    const query = params.query || {};
    if (query.name) {
      query.name = { $regex: normalizeNFC(String(query.name)), $options: 'i' };
    }
    // Preserve other query properties and pass through
    params.query = query;
    return super.find(params);
  }
}

In hooks, normalize fields that affect routing or authorization, such as identifiers used in BOLA checks. This prevents a mismatch between the value used for ownership verification and the value stored in MongoDB. Combine normalization with strict equality checks and avoid relying on case-insensitive collation as a substitute for canonicalization, since collation rules vary by locale and may not address decomposition differences.

// hooks/normalize-ids.hook.js
import { normalizeNFC } from '../utils/unicode';

export default function normalizeIdsHook() {
  return async context => {
    if (context.params.query && context.params.query.userId) {
      context.params.query.userId = normalizeNFC(String(context.params.query.userId));
    }
    return context;
  };
}

For new documents, normalize text fields before insertion to maintain consistency across reads and writes. This is particularly important for unique indexes, where binary differences can allow duplicate entries that appear identical to users but differ in code points. By normalizing at ingestion and query time, you reduce the attack surface for IDOR and BOLA-style confusion in a FeathersJS + MongoDB deployment.

Frequently Asked Questions

Does MongoDB support Unicode normalization natively, or should normalization be handled in the application?

MongoDB does not perform automatic Unicode normalization for equality or indexing. Strings are compared by their binary code unit sequence. To ensure consistent matching, normalize to a canonical form (e.g., NFC) in your FeathersJS service layer before constructing queries or building indexes.

Can relying on MongoDB collation replace Unicode normalization for security-sensitive matching?

Collation provides locale-aware comparison and case-folding but does not resolve canonical equivalence issues arising from different Unicode compositions. Use normalization to ensure binary identity; treat collation as a complementary feature for ordering, not a substitute for canonicalization.

Unicode Normalization in Feathersjs with Mongodb