Unicode Normalization in Express
How Unicode Normalization Manifests in Express
Unicode normalization attacks in Express applications exploit how the framework handles character encoding and path resolution. Express's routing system uses Node.js's path module under the hood, which can be vulnerable to path traversal when Unicode normalization isn't properly handled.
The most common attack vector involves using Unicode characters that normalize to dangerous paths. For example, a request to /api/users/..%C3%82%2F..%C3%82%2Fetc%2Fpasswd can bypass Express route handlers. Here's what happens:
// Vulnerable Express route
app.get('/api/users/:id', (req, res) => {
const filePath = path.join(__dirname, 'data', req.params.id);
fs.readFile(filePath, (err, data) => {
res.json({ data: data.toString() });
});
});
// Attack request: /api/users/..%C3%82%2F..%C3%82%2Fetc%2Fpasswd
// ..%C3%82%2F normalizes to ../ when decoded and normalized
Express doesn't automatically normalize Unicode paths, but Node.js's path module will normalize them during path resolution. This creates a gap where malicious Unicode sequences can bypass route guards.
Another manifestation occurs with filename comparisons. Consider this vulnerable code:
// Vulnerable file access
app.post('/upload', (req, res) => {
const filename = req.body.filename;
if (filename.includes('..')) {
return res.status(400).json({ error: 'Invalid filename' });
}
const filePath = path.join(__dirname, 'uploads', filename);
// Attack bypasses the .. check using Unicode: ⸮ instead of ..
fs.writeFileSync(filePath, req.body.content);
});
The ⸮ character (U+2E2E) looks like a period but doesn't match the .. check, yet Node.js's path resolution will treat it as a path separator when combined with other Unicode sequences.
Property access attacks also occur when Express parses JSON bodies with Unicode property names that normalize to sensitive properties:
// Vulnerable property access
app.post('/api/data', (req, res) => {
const data = req.body;
// Unicode property that normalizes to 'password'
const secret = data.ṗassẅord; // ṗ̈ = 'password' with combining marks
// This might bypass property checks if not normalized
});
These attacks are particularly dangerous because they often bypass simple string-based security checks while still functioning correctly in Node.js's Unicode-aware filesystem and object property resolution.
Express-Specific Detection
Detecting Unicode normalization vulnerabilities in Express requires both static analysis and runtime testing. middleBrick's Express-specific scanning includes these detection methods:
Path Normalization Testing - middleBrick sends requests with Unicode path sequences to your Express endpoints:
# Example of Unicode path traversal test
curl -X GET "http://yourapi.com/api/users/%C2%AE%C2%AE%2F%C2%AE%C2%AE%2Fetc%2Fpasswd"
# Tests various Unicode representations of path traversal
Property Name Fuzzing - The scanner tests JSON payloads with Unicode property names that normalize to sensitive keys:
{
"ṗassẅord": "test123",
"admin̈": true,
"user_id̈": 1
}
Route Handler Analysis - middleBrick analyzes your Express route definitions to identify patterns vulnerable to Unicode attacks:
// middleBrick detects this pattern as risky
app.get('/files/:filename', (req, res) => {
const filePath = path.join(__dirname, 'files', req.params.filename);
// No Unicode normalization or validation
});
Middleware Inspection - The scanner checks for missing Unicode validation in Express middleware:
// Vulnerable middleware that middleBrick flags
app.use((req, res, next) => {
// No Unicode normalization of req.params or req.body
next();
});
Response Analysis - middleBrick examines API responses for unintended data exposure that might result from Unicode bypass:
{
"risk_score": "C",
"findings": [
{
"severity": "high",
"category": "Input Validation",
"title": "Unicode Path Traversal Possible",
"remediation": "Normalize and validate all path parameters using path.normalize()"
}
]
}
The scanner's Express-specific checks include testing against common Unicode normalization vectors like:
- Precomposed vs decomposed Unicode forms
- Unicode characters that look like ASCII but have different code points
- Zero-width characters and control sequences
- Right-to-left override characters
Express-Specific Remediation
Express applications need specific remediation patterns for Unicode normalization vulnerabilities. Here are Express-native solutions:
Path Parameter Normalization - Always normalize and validate path parameters before use:
const express = require('express');
const path = require('path');
const app = express();
// Secure middleware for path parameters
app.use((req, res, next) => {
// Normalize and validate all path parameters
Object.keys(req.params).forEach(key => {
req.params[key] = normalizeAndValidate(req.params[key]);
});
next();
});
function normalizeAndValidate(value) {
// Normalize to NFC form
const normalized = value.normalize('NFC');
// Check for path traversal patterns
if (normalized.includes('..') ||
normalized.includes('⸮') || // Unicode dot
normalized.match(/[^ -~]/)) { // Non-ASCII chars
throw new Error('Invalid path parameter');
}
return normalized;
}
// Secure route using normalized parameters
app.get('/api/users/:id', (req, res) => {
const userId = req.params.id;
const safePath = path.join(__dirname, 'data', userId);
// Double-check path is within allowed directory
if (!safePath.startsWith(path.join(__dirname, 'data'))) {
return res.status(400).json({ error: 'Invalid path' });
}
fs.readFile(safePath, (err, data) => {
if (err) return res.status(404).json({ error: 'Not found' });
res.json({ data: data.toString() });
});
});
JSON Body Validation - Use Express middleware to validate and normalize JSON bodies:
const express = require('express');
const app = express();
// Secure body parser with Unicode validation
app.use(express.json({
verify: (req, res, buf) => {
const body = buf.toString();
// Check for Unicode property name attacks
if (/̇|̈|⸮/.test(body)) {
throw new Error('Unicode property name detected');
}
}
}));
// Property access with normalization
app.post('/api/data', (req, res) => {
const data = req.body;
// Normalize property names before access
const normalizedData = normalizeObjectProperties(data);
// Safe property access
const password = normalizedData.password;
const isAdmin = normalizedData.admin;
res.json({ success: true });
});
function normalizeObjectProperties(obj) {
const normalized = {};
for (const key in obj) {
// Remove combining characters and normalize
const cleanKey = key.normalize('NFC').replace(/[̀-ͯ]/g, '');
normalized[cleanKey] = obj[key];
}
return normalized;
}
Filename Validation - Implement strict filename validation in Express:
// Secure file upload handler
app.post('/upload', (req, res) => {
const filename = req.body.filename;
// Strict validation
if (!/^[a-zA-Z0-9_.-]+$/.test(filename)) {
return res.status(400).json({ error: 'Invalid filename' });
}
// Additional Unicode check
if (/[^ -~]/.test(filename)) {
return res.status(400).json({ error: 'Unicode not allowed' });
}
const filePath = path.join(__dirname, 'uploads', filename);
// Verify path is within uploads directory
if (!filePath.startsWith(path.join(__dirname, 'uploads'))) {
return res.status(400).json({ error: 'Path traversal detected' });
}
fs.writeFileSync(filePath, req.body.content);
res.json({ success: true });
});
Route Guard Middleware - Add Express middleware to protect against Unicode attacks:
// Security middleware for Express
function unicodeSecurityMiddleware(req, res, next) {
// Check URL for Unicode path traversal
if (/[^ -~]/.test(req.url)) {
return res.status(400).json({ error: 'Invalid characters in URL' });
}
// Check query parameters
for (const key in req.query) {
if (/[^ -~]/.test(key + req.query[key])) {
return res.status(400).json({ error: 'Invalid query parameter' });
}
}
// Check body for Unicode property attacks
if (req.body && typeof req.body === 'object') {
for (const key in req.body) {
if (/[^ -~]/.test(key)) {
return res.status(400).json({ error: 'Invalid property name' });
}
}
}
next();
}
// Apply security middleware globally
app.use(unicodeSecurityMiddleware);