Unicode Normalization on Aws
How Unicode Normalization Manifests in Aws
Unicode normalization vulnerabilities in Aws applications often emerge through authentication bypasses and data integrity issues. The problem occurs when Aws applications handle user input inconsistently across different normalization forms, creating opportunities for attackers to craft malicious payloads that bypass security controls.
A common manifestation appears in password validation systems. Consider an Aws application using bcrypt for password hashing. If the application normalizes passwords to NFC before hashing but stores them in NFD form, an attacker can register with a password like "password" (where 'a' is represented as U+0061 followed by U+0300 combining grave accent). When the application normalizes to NFC during registration, it becomes "password", but during login, if normalization isn't applied consistently, the attacker can authenticate with the precomposed form.
# Vulnerable Aws authentication flow
import bcrypt
import unicodedata
def register_user(username, password):
# Bug: inconsistent normalization
normalized = unicodedata.normalize('NFC', password)
hashed = bcrypt.hashpw(normalized.encode('utf-8'), bcrypt.gensalt())
# Store in database without re-normalizing
db.store(username, hashed, password) # password stored in original form
def authenticate_user(username, password):
stored_hash, stored_password = db.retrieve(username)
# Bug: uses stored form directly
return bcrypt.checkpw(password.encode('utf-8'), stored_hash)
# Attacker registers with "password" (a + combining grave)
# During registration, NFC normalization converts it to "password"
# But authentication compares raw input against stored hash
# Attacker authenticates with "password" instead of original inputAnother Aws-specific scenario involves API key validation. Aws applications often use API keys for authentication, and if these keys contain Unicode characters, normalization discrepancies can allow key enumeration or bypass. An attacker might discover that "API KEY" (with U+0020 regular space) and "API KEY" (with U+00A0 non-breaking space) are treated as equivalent during validation but not during generation, enabling brute-force attacks with normalized variants.
Database indexing in Aws applications presents another attack vector. When Aws applications use PostgreSQL or MySQL with Unicode columns, inconsistent collation settings can cause authentication bypasses. If a username index uses utf8mb4_unicode_ci collation but application logic compares using utf8mb4_bin, an attacker can register usernames that normalize to existing ones but appear distinct to the application.
-- Vulnerable Aws database schema
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50) COLLATE utf8mb4_unicode_ci UNIQUE,
password_hash TEXT
);
-- Attack scenario:
-- User registers as "useŕname" (u + combining acute)
-- Database stores as "username" due to collation
-- Application logic compares using binary comparison
-- Attacker authenticates as "username" but accesses "useŕname" accountFile path traversal in Aws applications also suffers from normalization issues. When Aws applications process file uploads or access control, Unicode characters in paths can bypass security checks. An attacker might use ".." (U+202E right-to-left override) to manipulate path parsing, causing the application to interpret directory traversal sequences differently than intended.
Aws-Specific Detection
Detecting Unicode normalization vulnerabilities in Aws applications requires systematic testing across the entire authentication and data processing pipeline. The first step is analyzing how your Aws application handles Unicode input at each processing stage.
Automated scanning with middleBrick can identify normalization-related security issues in Aws APIs. The scanner tests for authentication bypasses by submitting Unicode variants of known credentials and checking if the application treats them as equivalent. For example, middleBrick would test if "password", "password", and "pâssword" all authenticate successfully when they shouldn't.
# Using middleBrick CLI to scan Aws API for normalization issues
npm install -g middlebrick
middlebrick scan https://api.yourapp.com/auth --tests=authentication,bolaidor
# middleBrick performs Unicode variant testing:
# - Tests NFC, NFD, NFKC, NFKD forms of input
# - Checks if equivalent Unicode sequences bypass authentication
# - Identifies inconsistent handling across API endpoints
# - Reports findings with severity and remediation guidanceManual testing should focus on authentication endpoints, password reset flows, and any functionality that processes user identifiers. Test with characters from different Unicode blocks: Latin with combining diacritics, Cyrillic lookalikes, and full-width/half-width variants. For each endpoint, submit the same logical input in different Unicode forms and verify the application's response consistency.
Database-level detection involves examining collation settings and normalization behavior. Query your Aws application's database to identify columns where Unicode characters might cause equivalence issues. For PostgreSQL, check for citext columns or case-insensitive collations. For MySQL, examine utf8mb4_unicode_ci versus utf8mb4_bin collations.
-- Aws database analysis for normalization vulnerabilities
-- Check collation settings
SELECT
table_name,
column_name,
collation_name
FROM information_schema.columns
WHERE table_schema = 'your_app'
AND collation_name LIKE '%unicode%'
AND collation_name NOT LIKE '%bin%';
-- Test equivalence
SELECT
'password' = 'password' COLLATE utf8mb4_unicode_ci AS unicode_eq,
'password' = 'password' COLLATE utf8mb4_bin AS binary_eq;
-- Identify potential bypass candidates
SELECT * FROM users
WHERE username = 'existinguser'
OR username = 'ex́istinguser'; -- with combining acuteCode review should examine all input processing paths in your Aws application. Look for areas where user input is normalized, compared, or stored without consistent handling. Pay special attention to authentication middleware, authorization checks, and any code that generates or validates tokens, API keys, or session identifiers.
Aws-Specific Remediation
Remediating Unicode normalization vulnerabilities in Aws applications requires implementing consistent normalization throughout the entire data processing pipeline. The fundamental principle is to normalize all user input to a single form before any processing, comparison, or storage occurs.
For authentication systems in Aws applications, implement pre-normalization of all credentials before hashing or comparison. Use NFC normalization consistently across registration, authentication, and password reset flows. Store only the normalized form in your database, never the original user input.
# Aws authentication with consistent Unicode normalization
import bcrypt
import unicodedata
def normalize_input(data):
# Normalize to NFC form before any processing
return unicodedata.normalize('NFC', data)
def register_user(username, password):
# Normalize before hashing
normalized_username = normalize_input(username)
normalized_password = normalize_input(password)
# Hash normalized password
hashed = bcrypt.hashpw(normalized_password.encode('utf-8'), bcrypt.gensalt())
# Store normalized forms
db.store_user({
'username': normalized_username,
'password_hash': hashed
})
def authenticate_user(username, password):
normalized_username = normalize_input(username)
normalized_password = normalize_input(password)
stored_user = db.retrieve_user(normalized_username)
if not stored_user:
return False
return bcrypt.checkpw(normalized_password.encode('utf-8'), stored_user['password_hash'])
# All input flows through normalize_input() before processingFor database schemas in Aws applications, use binary collations for columns that require exact matching, or implement application-level normalization before database operations. Avoid case-insensitive collations for security-sensitive columns like usernames or API keys.
-- Aws database schema with security-conscious collation
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50) COLLATE utf8mb4_bin UNIQUE, -- binary collation for exact matching
email VARCHAR(100) COLLATE utf8mb4_bin,
password_hash TEXT
);
-- Aws application code for safe insertion
INSERT INTO users (username, email, password_hash)
VALUES (normalize_input(?), normalize_input(?), ?);
-- Always normalize before comparison
SELECT * FROM users
WHERE username = normalize_input(?);
-- For case-insensitive search where needed, normalize explicitly
SELECT * FROM users
WHERE LOWER(username) = LOWER(normalize_input(?));API key validation in Aws applications should implement strict normalization policies. Define whether API keys can contain Unicode characters, and if so, enforce consistent normalization. Consider restricting API keys to ASCII characters only to eliminate normalization complexity entirely.
# Aws API key validation with normalization
import re
def validate_api_key(api_key):
# Option 1: Restrict to ASCII only
if not re.match(r'^[A-Za-z0-9]{32,64}$', api_key):
return False
# Option 2: Allow Unicode with strict normalization
normalized_key = unicodedata.normalize('NFC', api_key)
if len(normalized_key) != len(api_key):
return False # Length changed during normalization
return True
def authenticate_api(api_key):
normalized_key = unicodedata.normalize('NFC', api_key)
stored_key = db.get_api_key(normalized_key)
return stored_key and stored_key['active']
# Always normalize before storage and comparisonInput validation middleware in Aws applications should normalize all incoming requests before routing to handlers. This ensures consistent handling across the entire application stack.
// Aws Express middleware for Unicode normalization
const normalizeMiddleware = (req, res, next) => {
// Normalize query parameters
if (req.query) {
Object.keys(req.query).forEach(key => {
req.query[key] = normalizeInput(req.query[key]);
});
}
// Normalize body parameters
if (req.body) {
normalizeObject(req.body);
}
// Normalize headers (be cautious with this)
if (req.headers) {
Object.keys(req.headers).forEach(key => {
req.headers[key] = normalizeInput(req.headers[key]);
});
}
next();
};
function normalizeInput(input) {
if (typeof input === 'string') {
return input.normalize('NFC');
}
return input;
}
function normalizeObject(obj) {
Object.keys(obj).forEach(key => {
if (typeof obj[key] === 'object') {
normalizeObject(obj[key]);
} else if (typeof obj[key] === 'string') {
obj[key] = normalizeInput(obj[key]);
}
});
}
// Apply middleware globally
app.use(normalizeMiddleware);Testing your Aws application's normalization remediation is crucial. Implement comprehensive test suites that verify normalization consistency across all authentication and authorization paths. Test with various Unicode characters, including combining sequences, full-width characters, and characters from different scripts.