MEDIUM unicode normalizationjwt tokens

Unicode Normalization with Jwt Tokens

How Unicode Normalization Manifests in Jwt Tokens

Unicode normalization vulnerabilities in JWT tokens create a subtle but dangerous attack surface that many developers overlook. JWT tokens are base64-encoded JSON objects containing claims about the authenticated user. When Unicode characters appear in claims like sub, aud, or custom fields, improper handling can lead to authentication bypasses and authorization flaws.

The core issue stems from how Unicode characters can have multiple valid representations. The character 'é' can be encoded as U+00E9 (precomposed) or as U+0065 U+0301 (e + combining acute accent). These should be treated as identical, but if a JWT library or backend system doesn't normalize Unicode before processing, an attacker can craft tokens that bypass security checks.

// Vulnerable JWT processing// Attacker creates token with combining charactersconst maliciousPayload = {  sub: "admin\u0301"};const maliciousToken = jwt.sign(maliciousPayload, 'secret');const payload = jwt.verify(maliciousToken, 'secret');console.log(payload.sub); // "admiń" - visually identical to "admin" but different bytes

This becomes critical when comparing JWT claims against database values or authorization rules. If your backend stores usernames as 'admin' but receives 'admiń' in a JWT token, string comparisons may fail, leading to unexpected behavior or bypasses.

Another manifestation occurs in JWT audience claims (aud). An attacker could craft tokens with visually similar Unicode characters that appear to target the correct audience but actually contain different bytes, potentially allowing tokens to be accepted by unintended services.

# Example of Unicode audience manipulationimport jwt# Service A expects audience 'example.com'payload = {'aud': 'exampĺe.com'} # Using combining charactertoken = jwt.encode(payload, 'secret', algorithm='HS256')# Service B receives token but doesn't normalize# Visual inspection shows 'example.com' but bytes differ

Property authorization checks are particularly vulnerable. When JWT tokens contain arrays or objects with Unicode keys, inconsistent normalization can lead to authorization bypasses where certain properties are incorrectly considered missing or present.

// Property authorization vulnerability// Token contains Unicode property namesMap claims = jwtParser.parse(token);if (!claims.containsKey("admin")) { // Missing if token has "admiń"  throw new AccessDeniedException();}

JWT Tokens-Specific Detection

Detecting Unicode normalization issues in JWT tokens requires both static analysis of your token processing code and dynamic testing of your authentication flows. Start by examining how your JWT libraries handle Unicode characters throughout the token lifecycle.

Code review should focus on these critical areas:

// What to look for in your codebase// 1. Direct string comparisons without normalizationif (jwtSub !== storedSub) { /* vulnerable */ }// 2. Database queries using non-normalized stringsconst user = await db.query('SELECT * FROM users WHERE username = ?', [jwtSub]); // vulnerable// 3. Authorization checks without normalizationif (claims.role !== 'admin') { /* vulnerable */ }

Dynamic testing involves creating JWT tokens with Unicode variations and observing how your system handles them. Use tools like middleBrick to automate this scanning process.

middleBrick's JWT security scanning specifically tests for Unicode normalization issues by:

Generating tokens with combining characters in critical claims
Testing property authorization with Unicode property names
Verifying that authentication bypasses don't occur with visually similar Unicode
Checking that audience validation works correctly with Unicode variations

Manual testing should include:

# Generate test tokens with Unicode variationsnode -e "const jwt = require('jsonwebtoken'); const token = jwt.sign({sub: 'admiń'}, 'secret'); console.log(token);"# Test token processingcurl -H "Authorization: Bearer $TOKEN" https://yourapi.com/protected

Log analysis can reveal Unicode issues when you see unexpected claim values or authorization failures that seem to occur randomly. Look for claims containing characters like ́, ̈, or other combining marks.

JWT Tokens-Specific Remediation

Remediating Unicode normalization vulnerabilities in JWT tokens requires a systematic approach to ensure consistent handling throughout your authentication and authorization pipeline. The solution involves normalizing Unicode at the boundaries of your system.

The most effective approach is to normalize all JWT claims immediately after token verification and before any processing occurs:

// Secure JWT processing with Unicode normalizationconst jwt = require('jsonwebtoken');const { normalize } = require('unicode-normalization');function verifyAndNormalize(token, secret) {  const decoded = jwt.verify(token, secret);  const normalized = {};  for (const [key, value] of Object.entries(decoded)) {    if (typeof value === 'string') {      normalized[key] = normalize(value, 'NFC'); // Normalize to composed form    } else {      normalized[key] = value;    }  }  return normalized;}// Usageconst claims = verifyAndNormalize(token, 'secret');if (claims.sub === 'admin') {  // Safe comparison - both sides normalized}

For database operations, ensure consistent Unicode handling by normalizing both the stored values and the query parameters:

# Database-safe Unicode handlingimport jwtimport unicodedata# Normalize before database comparisondef get_user_by_jwt_sub(token, secret, db):    decoded = jwt.decode(token, secret, algorithms=['HS256'])    normalized_sub = unicodedata.normalize('NFC', decoded['sub'])    # Normalize database values or use binary comparison    user = db.execute('SELECT * FROM users WHERE normalized_username = ?',    [unicodedata.normalize('NFC', normalized_sub)]).fetchone()    return user

Property authorization systems need special attention. Always normalize property names when checking permissions:

// Property authorization with Unicode normalizationpublic boolean hasPermission(Map claims, String requiredRole) {    String normalizedRole = Normalizer.normalize(requiredRole, Normalizer.Form.NFC);    String userRole = Normalizer.normalize((String) claims.get("role"), Normalizer.Form.NFC);    return normalizedRole.equals(userRole);}

Audience validation should also normalize Unicode characters to prevent spoofing:

// Audience validation with Unicode normalizationimport (    "golang.org/x/text/unicode/norm")func validateAudience(token *jwt.Token, expectedAudience string) bool {    claims, ok := token.Claims.(jwt.MapClaims);    if !ok { return false }    audience, ok := claims["aud"].(string);    if !ok { return false }    normalizedAudience := norm.NFC.String(audience);    normalizedExpected := norm.NFC.String(expectedAudience);    return normalizedAudience == normalizedExpected;}

Frequently Asked Questions

Why does Unicode normalization matter for JWT tokens specifically?

JWT tokens are often used for authentication and authorization decisions. When Unicode characters aren't normalized consistently, visually identical strings can have different byte representations, leading to authentication bypasses, authorization flaws, and property access issues that attackers can exploit.

What's the difference between NFC and NFD normalization forms for JWT processing?

NFC (Normalization Form C) composes characters into precomposed forms, while NFD (Normalization Form D) decomposes them. For JWT processing, NFC is typically preferred because it produces the most compact representation and matches how most users expect characters to appear. The key is consistency—choose one form and apply it uniformly across your entire JWT processing pipeline.

Unicode Normalization with Jwt Tokens

How Unicode Normalization Manifests in Jwt Tokens

JWT Tokens-Specific Detection

JWT Tokens-Specific Remediation

Frequently Asked Questions

Related Pages