HIGH unicode normalizationexpress

Unicode Normalization in Express

How Unicode Normalization Manifests in Express

Unicode normalization attacks in Express applications exploit how the framework handles character encoding and path resolution. Express's routing system uses Node.js's path module under the hood, which can be vulnerable to path traversal when Unicode normalization isn't properly handled.

The most common attack vector involves using Unicode characters that normalize to dangerous paths. For example, a request to /api/users/..%C3%82%2F..%C3%82%2Fetc%2Fpasswd can bypass Express route handlers. Here's what happens:

// Vulnerable Express route
app.get('/api/users/:id', (req, res) => {
  const filePath = path.join(__dirname, 'data', req.params.id);
  fs.readFile(filePath, (err, data) => {
    res.json({ data: data.toString() });
  });
});

// Attack request: /api/users/..%C3%82%2F..%C3%82%2Fetc%2Fpasswd
// ..%C3%82%2F normalizes to ../ when decoded and normalized

Express doesn't automatically normalize Unicode paths, but Node.js's path module will normalize them during path resolution. This creates a gap where malicious Unicode sequences can bypass route guards.

Another manifestation occurs with filename comparisons. Consider this vulnerable code:

// Vulnerable file access
app.post('/upload', (req, res) => {
  const filename = req.body.filename;
  if (filename.includes('..')) {
    return res.status(400).json({ error: 'Invalid filename' });
  }
  
  const filePath = path.join(__dirname, 'uploads', filename);
  // Attack bypasses the .. check using Unicode: ⸮ instead of ..
  fs.writeFileSync(filePath, req.body.content);
});

The character (U+2E2E) looks like a period but doesn't match the .. check, yet Node.js's path resolution will treat it as a path separator when combined with other Unicode sequences.

Property access attacks also occur when Express parses JSON bodies with Unicode property names that normalize to sensitive properties:

// Vulnerable property access
app.post('/api/data', (req, res) => {
  const data = req.body;
  // Unicode property that normalizes to 'password'
  const secret = data.ṗassẅord; // ṗ̈ = 'password' with combining marks
  // This might bypass property checks if not normalized
});

These attacks are particularly dangerous because they often bypass simple string-based security checks while still functioning correctly in Node.js's Unicode-aware filesystem and object property resolution.

Express-Specific Detection

Detecting Unicode normalization vulnerabilities in Express requires both static analysis and runtime testing. middleBrick's Express-specific scanning includes these detection methods:

Path Normalization Testing - middleBrick sends requests with Unicode path sequences to your Express endpoints:

# Example of Unicode path traversal test
curl -X GET "http://yourapi.com/api/users/%C2%AE%C2%AE%2F%C2%AE%C2%AE%2Fetc%2Fpasswd"
# Tests various Unicode representations of path traversal

Property Name Fuzzing - The scanner tests JSON payloads with Unicode property names that normalize to sensitive keys:

{
  "ṗassẅord": "test123",
  "admin̈": true,
  "user_id̈": 1
}

Route Handler Analysis - middleBrick analyzes your Express route definitions to identify patterns vulnerable to Unicode attacks:

// middleBrick detects this pattern as risky
app.get('/files/:filename', (req, res) => {
  const filePath = path.join(__dirname, 'files', req.params.filename);
  // No Unicode normalization or validation
});

Middleware Inspection - The scanner checks for missing Unicode validation in Express middleware:

// Vulnerable middleware that middleBrick flags
app.use((req, res, next) => {
  // No Unicode normalization of req.params or req.body
  next();
});

Response Analysis - middleBrick examines API responses for unintended data exposure that might result from Unicode bypass:

{
  "risk_score": "C",
  "findings": [
    {
      "severity": "high",
      "category": "Input Validation",
      "title": "Unicode Path Traversal Possible",
      "remediation": "Normalize and validate all path parameters using path.normalize()"
    }
  ]
}

The scanner's Express-specific checks include testing against common Unicode normalization vectors like:

  • Precomposed vs decomposed Unicode forms
  • Unicode characters that look like ASCII but have different code points
  • Zero-width characters and control sequences
  • Right-to-left override characters

Express-Specific Remediation

Express applications need specific remediation patterns for Unicode normalization vulnerabilities. Here are Express-native solutions:

Path Parameter Normalization - Always normalize and validate path parameters before use:

const express = require('express');
const path = require('path');
const app = express();

// Secure middleware for path parameters
app.use((req, res, next) => {
  // Normalize and validate all path parameters
  Object.keys(req.params).forEach(key => {
    req.params[key] = normalizeAndValidate(req.params[key]);
  });
  next();
});

function normalizeAndValidate(value) {
  // Normalize to NFC form
  const normalized = value.normalize('NFC');
  
  // Check for path traversal patterns
  if (normalized.includes('..') || 
      normalized.includes('⸮') || // Unicode dot
      normalized.match(/[^ -~]/)) { // Non-ASCII chars
    throw new Error('Invalid path parameter');
  }
  
  return normalized;
}

// Secure route using normalized parameters
app.get('/api/users/:id', (req, res) => {
  const userId = req.params.id;
  const safePath = path.join(__dirname, 'data', userId);
  
  // Double-check path is within allowed directory
  if (!safePath.startsWith(path.join(__dirname, 'data'))) {
    return res.status(400).json({ error: 'Invalid path' });
  }
  
  fs.readFile(safePath, (err, data) => {
    if (err) return res.status(404).json({ error: 'Not found' });
    res.json({ data: data.toString() });
  });
});

JSON Body Validation - Use Express middleware to validate and normalize JSON bodies:

const express = require('express');
const app = express();

// Secure body parser with Unicode validation
app.use(express.json({
  verify: (req, res, buf) => {
    const body = buf.toString();
    
    // Check for Unicode property name attacks
    if (/̇|̈|⸮/.test(body)) {
      throw new Error('Unicode property name detected');
    }
  }
}));

// Property access with normalization
app.post('/api/data', (req, res) => {
  const data = req.body;
  
  // Normalize property names before access
  const normalizedData = normalizeObjectProperties(data);
  
  // Safe property access
  const password = normalizedData.password;
  const isAdmin = normalizedData.admin;
  
  res.json({ success: true });
});

function normalizeObjectProperties(obj) {
  const normalized = {};
  for (const key in obj) {
    // Remove combining characters and normalize
    const cleanKey = key.normalize('NFC').replace(/[̀-ͯ]/g, '');
    normalized[cleanKey] = obj[key];
  }
  return normalized;
}

Filename Validation - Implement strict filename validation in Express:

// Secure file upload handler
app.post('/upload', (req, res) => {
  const filename = req.body.filename;
  
  // Strict validation
  if (!/^[a-zA-Z0-9_.-]+$/.test(filename)) {
    return res.status(400).json({ error: 'Invalid filename' });
  }
  
  // Additional Unicode check
  if (/[^ -~]/.test(filename)) {
    return res.status(400).json({ error: 'Unicode not allowed' });
  }
  
  const filePath = path.join(__dirname, 'uploads', filename);
  
  // Verify path is within uploads directory
  if (!filePath.startsWith(path.join(__dirname, 'uploads'))) {
    return res.status(400).json({ error: 'Path traversal detected' });
  }
  
  fs.writeFileSync(filePath, req.body.content);
  res.json({ success: true });
});

Route Guard Middleware - Add Express middleware to protect against Unicode attacks:

// Security middleware for Express
function unicodeSecurityMiddleware(req, res, next) {
  // Check URL for Unicode path traversal
  if (/[^ -~]/.test(req.url)) {
    return res.status(400).json({ error: 'Invalid characters in URL' });
  }
  
  // Check query parameters
  for (const key in req.query) {
    if (/[^ -~]/.test(key + req.query[key])) {
      return res.status(400).json({ error: 'Invalid query parameter' });
    }
  }
  
  // Check body for Unicode property attacks
  if (req.body && typeof req.body === 'object') {
    for (const key in req.body) {
      if (/[^ -~]/.test(key)) {
        return res.status(400).json({ error: 'Invalid property name' });
      }
    }
  }
  
  next();
}

// Apply security middleware globally
app.use(unicodeSecurityMiddleware);

Frequently Asked Questions

Why doesn't Express automatically handle Unicode normalization?
Express is designed to be unopinionated and lightweight, leaving character encoding decisions to developers based on their specific use cases. Node.js's Unicode handling is already comprehensive, but security-conscious normalization requires application-specific policies. Express trusts developers to implement appropriate validation for their context, whether that's strict ASCII-only policies or controlled Unicode support.
Can middleBrick automatically fix Unicode normalization issues in my Express app?
No, middleBrick detects and reports Unicode normalization vulnerabilities but does not modify your code. The scanner provides specific findings with severity levels and remediation guidance, but applying fixes requires developer intervention. middleBrick's role is to identify security issues you might miss, not to alter your application's behavior.