HIGH unicode normalizationexpressbearer tokens

Unicode Normalization in Express with Bearer Tokens

Unicode Normalization in Express with Bearer Tokens — how this specific combination creates or exposes the vulnerability

Unicode normalization inconsistencies become significant in Express APIs when bearer tokens are handled as strings subject to normalization by comparison or storage layers. An access token like éxample-token can be represented in multiple Unicode forms (NFC, NFD, NFKC, NFKD). If your Express application normalizes incoming token strings differently than the identity provider or token storage, an attacker can supply a visually identical but differently normalized token and bypass intended access controls.

Consider an Express route that validates bearer tokens by direct string comparison or by looking up a normalized value in a database or cache without consistent normalization. An attacker could register or obtain a valid token in one normalization form and then send an equivalent but differently normalized form in the Authorization header:

Authorization: Bearer éxample-token

If the server normalizes the header value before lookup but the token store does not (or uses a different Unicode normalization form), the comparison may succeed incorrectly or fail unexpectedly, leading to either privilege confusion or authentication bypass. This is especially relevant when tokens are derived from user-controlled identifiers (such as emails or usernames) that are subject to normalization at registration time.

In the context of middleBrick’s 12 security checks, Unicode normalization issues may surface under Input Validation and Property Authorization. The scanner tests whether differently normalized bearer tokens produce different authorization outcomes, exposing BOLA/IDOR-like confusion or authentication weaknesses. Because bearer tokens are high-value secrets, inconsistent normalization can unintentionally grant access to resources an attacker should not reach, violating the principle of secure string comparison and canonicalization.

Another subtle risk involves logging or error handling: if Express logs the raw Authorization header without normalization, tokens that normalize differently may appear as distinct entries, complicating audit trails and making suspicious activity harder to detect. Similarly, tokens stored in cookies or local storage by client-side code may be normalized by the browser differently than the server expects, leading to intermittent authentication failures that an attacker can exploit by manipulating the encoding of submitted credentials.

Bearer Tokens-Specific Remediation in Express — concrete code fixes

To mitigate Unicode normalization issues with bearer tokens in Express, enforce a single normalization form for all token handling and comparison operations. Use a well‑tested Unicode library to normalize incoming token strings before any lookup or comparison, and ensure the same normalization is applied when storing or indexing tokens.

The following Express middleware demonstrates a robust approach using the @stdlib/string/normalize package to normalize both the incoming Authorization header and stored tokens to NFC before comparison:

const express = require('express');
const normalize = require('@stdlib/string/normalize');

const app = express();
const PORT = 3000;

// Simulated token store where tokens are stored in NFC
const tokenStore = new Set([
  normalize('éxample-token', 'NFC'),
  normalize('user-session-abc', 'NFC')
]);

function bearerAuth(req, res, next) {
  const authHeader = req.headers.authorization || '';
  const [scheme, token] = authHeader.split(' ');

  if (scheme !== 'Bearer' || !token) {
    return res.status(401).json({ error: 'Unauthorized' });
  }

  // Normalize the incoming token to NFC before comparison
  const normalizedToken = normalize(token, 'NFC');

  if (!tokenStore.has(normalizedToken)) {
    return res.status(403).json({ error: 'Forbidden' });
  }

  req.user = { token: normalizedToken };
  next();
}

app.use(bearerAuth);

app.get('/profile', (req, res) => {
  res.json({ message: 'Access granted', token: req.user.token });
});

app.listen(PORT, () => {
  console.log(`Server running on port ${PORT}`);
});

Key points in this remediation:

  • Normalize both stored tokens and incoming tokens to the same Unicode form (NFC in this example) before any equality check.
  • Reject tokens that do not use the expected scheme rather than attempting to recover from malformed Authorization headers.
  • Apply normalization consistently across authentication, authorization, and any logging mechanisms to avoid discrepancies that could be leveraged by attackers.

For production deployments, pair this approach with security tests that verify equivalence classes of normalized tokens. middleBrick’s CLI can be used to scan your Express endpoints and surface inconsistencies in how bearer tokens are handled under the hood. By integrating middlebrick scan <url> into development workflows, teams can detect normalization-related findings before they reach production.

Frequently Asked Questions

Why does Unicode normalization matter for bearer tokens in Express?
Because tokens may be represented in different Unicode forms (NFC, NFD, etc.). If incoming tokens are normalized differently than stored tokens, an attacker can supply a visually identical but differently encoded token to bypass authentication or authorization checks.
Can I rely on built-in Node.js string methods for normalization?
Node.js provides String.prototype.normalize, but you must explicitly choose and enforce a single normalization form (e.g., NFC) for all token operations and ensure consistent application across storage, comparison, and logging.