HIGH unicode normalizationfastapibasic auth

Unicode Normalization in Fastapi with Basic Auth

Unicode Normalization in Fastapi with Basic Auth — how this specific combination creates or exposes the vulnerability

When Fastapi applications use HTTP Basic Authentication, credentials are transmitted as an Authorization header containing a base64-encoded string of username:password. While base64 decoding itself is deterministic, the security boundary can be undermined if the application compares decoded strings without normalizing Unicode. Unicode normalization is the process of converting text to a canonical form (NFC, NFD, NFKC, NFKD) so that visually identical characters with different binary representations are treated as equal. Without explicit normalization, an attacker can supply a username that appears identical to an authorized user but uses different code points, bypassing comparison logic and potentially gaining access.

Consider an account with username admin (standard Latin characters). An attacker could register or supply credentials using a visually identical homoglyph such as аdmin (Cyrillic small letter a). If Fastapi decodes the base64 value and compares it against stored credentials using a simple string equality check, the comparison may fail to recognize that these are visually the same, allowing the attacker to authenticate as admin depending on how the backend processes and stores usernames. This issue is especially relevant when usernames are normalized at registration but not at authentication, creating an inconsistency that attackers can exploit via crafted Unicode input.

In Fastapi, this risk is realized when route dependencies decode Basic Auth headers and perform string comparisons or hashing without prior normalization. For example, using security.HTTPBasic() provides the raw header; developers must explicitly decode and normalize before matching against stored values. Without normalization, inputs that bypass authentication may also affect downstream security checks, such as ownership or authorization logic tied to the username, increasing the likelihood of Broken Object Level Authorization (BOLA) or Insecure Direct Object References (IDOR) when access controls rely on username identity.

Additionally, normalization inconsistencies can interact with logging and monitoring systems. If usernames are normalized differently at ingestion versus query time, audit trails may not accurately reflect authentication events, complicating incident response. The combination of Fastapi’s dependency injection for authentication and the variability of Unicode input creates a scenario where security boundaries are only as strong as the normalization applied across the stack. Developers should treat normalization as a mandatory preprocessing step for any user-controlled identifier used in security decisions.

Basic Auth-Specific Remediation in Fastapi — concrete code fixes

To secure Basic Authentication in Fastapi against Unicode-based bypass, normalize usernames and passwords before comparison using a consistent form, such as NFC. This ensures that visually identical characters are reduced to a single canonical representation before any equality checks or cryptographic operations. Below is a complete, working example that demonstrates secure handling of Basic Auth with Unicode normalization.

from fastapi import Fastapi, Depends, HTTPException, status
from fastapi.security import HTTPBasic, HTTPBasicCredentials
import unicodedata
import base64

app = Fastapi()
security = HTTPBasic()

# Simulated user store with normalized usernames
USERS = {
    "admin": {"password": "s3cur3p@ss!"},
    "usér": {"password": "p@ssw0rd"},
}

def normalize_unicode(value: str) -> str:
    """Normalize string to NFC form for consistent comparison."""
    return unicodedata.normalize("NFC", value)

def get_user(credentials: HTTPBasicCredentials = Depends(security)):
    """Decode and normalize credentials, then validate."""
    if credentials is None:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Missing credentials",
            headers={"WWW-Authenticate": "Basic"},
        )
    decoded_bytes = base64.b64decode(credentials.credentials)
    decoded_str = decoded_bytes.decode("utf-8")
    username, password = decoded_str.split(":", 1)
    username = normalize_unicode(username)
    password = normalize_unicode(password)

    user = USERS.get(username)
    if user is None or user["password"] != password:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Incorrect username or password",
            headers={"WWW-Authenticate": "Basic"},
        )
    return username

@app.get("/secure")
def read_secure(username: str = Depends(get_user)):
    return {"message": f"Authenticated as {username}"}

This approach decodes the base64 credentials, splits the username and password, normalizes both using NFC, and then performs the comparison against stored values. By normalizing before lookup and comparison, you mitigate homoglyph and normalization-based bypass risks. Ensure that any persistent storage also stores normalized usernames to maintain consistency between registration and authentication stages.

For production use, combine this normalization with strict character-set policies and transport-layer encryption. While Fastapi provides the primitives for secure authentication, the developer is responsible for applying consistent preprocessing across the identity lifecycle. This pattern aligns with secure coding practices for handling user-controlled input in authentication flows and reduces the likelihood of bypass via Unicode manipulation.

Frequently Asked Questions

Why is Unicode normalization necessary for Basic Auth in Fastapi?
Without normalization, visually identical characters with different code points (e.g., Latin 'a' vs Cyrillic 'a') can bypass string comparisons, allowing authentication as a different user.
Does normalization affect stored passwords in Basic Auth?
Passwords should be normalized before comparison, but not stored in normalized form unless your security model accounts for reversible normalization; typically, store a salted hash of the original input and normalize only during comparison.