Unicode Normalization with Jwt Tokens
How Unicode Normalization Manifests in Jwt Tokens
Unicode normalization vulnerabilities in JWT tokens create a subtle but dangerous attack surface that many developers overlook. JWT tokens are base64-encoded JSON objects containing claims about the authenticated user. When Unicode characters appear in claims like sub, aud, or custom fields, improper handling can lead to authentication bypasses and authorization flaws.
The core issue stems from how Unicode characters can have multiple valid representations. The character 'é' can be encoded as U+00E9 (precomposed) or as U+0065 U+0301 (e + combining acute accent). These should be treated as identical, but if a JWT library or backend system doesn't normalize Unicode before processing, an attacker can craft tokens that bypass security checks.
// Vulnerable JWT processing// Attacker creates token with combining charactersconst maliciousPayload = { sub: "admin\u0301"};const maliciousToken = jwt.sign(maliciousPayload, 'secret');const payload = jwt.verify(maliciousToken, 'secret');console.log(payload.sub); // "admiń" - visually identical to "admin" but different bytesThis becomes critical when comparing JWT claims against database values or authorization rules. If your backend stores usernames as 'admin' but receives 'admiń' in a JWT token, string comparisons may fail, leading to unexpected behavior or bypasses.
Another manifestation occurs in JWT audience claims (aud). An attacker could craft tokens with visually similar Unicode characters that appear to target the correct audience but actually contain different bytes, potentially allowing tokens to be accepted by unintended services.
# Example of Unicode audience manipulationimport jwt# Service A expects audience 'example.com'payload = {'aud': 'exampĺe.com'} # Using combining charactertoken = jwt.encode(payload, 'secret', algorithm='HS256')# Service B receives token but doesn't normalize# Visual inspection shows 'example.com' but bytes differProperty authorization checks are particularly vulnerable. When JWT tokens contain arrays or objects with Unicode keys, inconsistent normalization can lead to authorization bypasses where certain properties are incorrectly considered missing or present.
// Property authorization vulnerability// Token contains Unicode property namesMap claims = jwtParser.parse(token);if (!claims.containsKey("admin")) { // Missing if token has "admiń" throw new AccessDeniedException();} JWT Tokens-Specific Detection
Detecting Unicode normalization issues in JWT tokens requires both static analysis of your token processing code and dynamic testing of your authentication flows. Start by examining how your JWT libraries handle Unicode characters throughout the token lifecycle.
Code review should focus on these critical areas:
// What to look for in your codebase// 1. Direct string comparisons without normalizationif (jwtSub !== storedSub) { /* vulnerable */ }// 2. Database queries using non-normalized stringsconst user = await db.query('SELECT * FROM users WHERE username = ?', [jwtSub]); // vulnerable// 3. Authorization checks without normalizationif (claims.role !== 'admin') { /* vulnerable */ }Dynamic testing involves creating JWT tokens with Unicode variations and observing how your system handles them. Use tools like middleBrick to automate this scanning process.
middleBrick's JWT security scanning specifically tests for Unicode normalization issues by:
- Generating tokens with combining characters in critical claims
- Testing property authorization with Unicode property names
- Verifying that authentication bypasses don't occur with visually similar Unicode
- Checking that audience validation works correctly with Unicode variations
Manual testing should include:
# Generate test tokens with Unicode variationsnode -e "const jwt = require('jsonwebtoken'); const token = jwt.sign({sub: 'admiń'}, 'secret'); console.log(token);"# Test token processingcurl -H "Authorization: Bearer $TOKEN" https://yourapi.com/protectedLog analysis can reveal Unicode issues when you see unexpected claim values or authorization failures that seem to occur randomly. Look for claims containing characters like ́, ̈, or other combining marks.
JWT Tokens-Specific Remediation
Remediating Unicode normalization vulnerabilities in JWT tokens requires a systematic approach to ensure consistent handling throughout your authentication and authorization pipeline. The solution involves normalizing Unicode at the boundaries of your system.
The most effective approach is to normalize all JWT claims immediately after token verification and before any processing occurs:
// Secure JWT processing with Unicode normalizationconst jwt = require('jsonwebtoken');const { normalize } = require('unicode-normalization');function verifyAndNormalize(token, secret) { const decoded = jwt.verify(token, secret); const normalized = {}; for (const [key, value] of Object.entries(decoded)) { if (typeof value === 'string') { normalized[key] = normalize(value, 'NFC'); // Normalize to composed form } else { normalized[key] = value; } } return normalized;}// Usageconst claims = verifyAndNormalize(token, 'secret');if (claims.sub === 'admin') { // Safe comparison - both sides normalized}For database operations, ensure consistent Unicode handling by normalizing both the stored values and the query parameters:
# Database-safe Unicode handlingimport jwtimport unicodedata# Normalize before database comparisondef get_user_by_jwt_sub(token, secret, db): decoded = jwt.decode(token, secret, algorithms=['HS256']) normalized_sub = unicodedata.normalize('NFC', decoded['sub']) # Normalize database values or use binary comparison user = db.execute('SELECT * FROM users WHERE normalized_username = ?', [unicodedata.normalize('NFC', normalized_sub)]).fetchone() return userProperty authorization systems need special attention. Always normalize property names when checking permissions:
// Property authorization with Unicode normalizationpublic boolean hasPermission(Map claims, String requiredRole) { String normalizedRole = Normalizer.normalize(requiredRole, Normalizer.Form.NFC); String userRole = Normalizer.normalize((String) claims.get("role"), Normalizer.Form.NFC); return normalizedRole.equals(userRole);} Audience validation should also normalize Unicode characters to prevent spoofing:
// Audience validation with Unicode normalizationimport ( "golang.org/x/text/unicode/norm")func validateAudience(token *jwt.Token, expectedAudience string) bool { claims, ok := token.Claims.(jwt.MapClaims); if !ok { return false } audience, ok := claims["aud"].(string); if !ok { return false } normalizedAudience := norm.NFC.String(audience); normalizedExpected := norm.NFC.String(expectedAudience); return normalizedAudience == normalizedExpected;}