Unicode Normalization in Django with Hmac Signatures
Unicode Normalization in Django with Hmac Signatures — how this specific combination creates or exposes the vulnerability
Django’s HMAC utilities, such as django.core.signing.dumps and django.core.signing.loads, rely on canonical string representations before computing an HMAC. When user-controlled input containing Unicode characters is included in the signed payload, normalization differences can lead to signature mismatch or bypasses. For example, the string café can be represented as a single code point U+00E9 (LATIN SMALL LETTER E WITH ACUTE) or as two code points c + combining acute accent (U+0301). These two representations are canonically equivalent but produce different byte sequences, and therefore different HMAC values if normalization is not applied consistently.
An attacker can exploit this by submitting a normalized variant of an identifier that is validated later in a different normalization form. If Django normalizes incoming data at one stage (e.g., during user lookup) but the HMAC is computed before normalization (or vice versa), the signature may validate successfully for the attacker’s crafted input while appearing legitimate to the application. This can lead to signature bypass, privilege escalation via tampered tokens, or unauthorized access to user-specific resources.
Another scenario involves URL path segments or query parameters that include Unicode characters. If Django’s HMAC is computed over the raw request path without normalizing the URL, an attacker can use different but equivalent Unicode forms to forge a signed URL that passes HMAC verification. The vulnerability is not in HMAC itself but in inconsistent application of Unicode normalization across the stack before the digest is produced or verified.
Real-world attack patterns resemble those observed in CVE-2019-14232 and CVE-2021-33205, where inconsistent normalization allowed token confusion. Although these CVEs are not specific to HMAC, the underlying principle applies: identity confusion via normalization can weaken integrity checks when canonicalization is incomplete. The risk is particularly relevant when tokens are used for password reset links, email verification, or API authentication where a valid HMAC is expected only for a specific canonical form.
To detect this class of issues, scans examine whether Unicode normalization is applied at both ends of the HMAC verification path. The presence of user-controlled data in signed payloads without normalization, or normalization at different stages, is flagged as a finding. Remediation requires ensuring that all data contributing to the HMAC is normalized to a single, canonical form, typically NFC, before signing and verification.
Hmac Signatures-Specific Remediation in Django — concrete code fixes
Remediation centers on normalizing all user-controlled or external data before it is included in the HMAC computation and before signature verification. Use Django’s built-in utilities or Python’s standard library to enforce NFC consistently.
Example of a vulnerable pattern:
import hashlib
import hmac
from django.core.signing import TimestampSigner
signer = TimestampSigner()
# User input may be in any Unicode form
user_value = request.GET.get('token', '')
# HMAC computed over raw input — vulnerable to normalization bypass
signature = signer.sign(user_value)
# Later verification may normalize differently
if not signer.unsign(signature, max_age=3600) == user_value:
# Potential mismatch due to normalization
pass
Correct approach using explicit normalization:
import unicodedata
import hmac
import hashlib
from django.core.signing import TimestampSigner
from django.utils.encoding import force_str
def normalize_and_sign(value: str) -> str:
# Normalize to NFC before signing
normalized = unicodedata.normalize('NFC', force_str(value))
signer = TimestampSigner()
return signer.sign(normalized)
def verify_normalized_signature(signed_value: str, max_age: int = 3600) -> str:
signer = TimestampSigner()
# Unsign first to get normalized payload
normalized = signer.unsign(signed_value, max_age=max_age)
# Ensure the returned value is also normalized for downstream use
return unicodedata.normalize('NFC', normalized)
# Usage
user_input = request.GET.get('token', '')
try:
payload = verify_normalized_signature(user_input)
# Use payload safely; it is already normalized
except Exception:
# Handle invalid signature
pass
When using Django’s signing module directly, wrap input with unicodedata.normalize('NFC', ...) before passing it to signing or verification functions. If you rely on higher-level constructs such as JSON Web Tokens or custom HMAC-based authentication, apply normalization at the point where the raw string is serialized for signing and again before comparison after verification.
For broader protection, apply normalization middleware for any user-supplied data that participates in integrity checks. Ensure that the same normalization form (NFC is recommended for security and wide compatibility) is used throughout the application, including in database lookups and cache keys. This prevents discrepancies between stored signatures and computed ones due to mixed Unicode representations.
In summary, the fix is to normalize all inputs to a single canonical form before HMAC computation and to ensure that verification compares normalized values. Concrete steps include using unicodedata.normalize explicitly, auditing any place where signed values are constructed or parsed, and validating that third-party libraries also adhere to the chosen normalization form.