Unicode Normalization in Fastapi
How Unicode Normalization Manifests in Fastapi
Unicode normalization attacks in Fastapi applications often exploit the framework's handling of international characters and the way Python's string comparison works under the hood. Fastapi, built on Starlette and Pydantic, processes incoming JSON and form data through its dependency injection system, creating multiple opportunities for normalization-based bypasses.
Consider a user authentication endpoint where usernames are compared case-insensitively:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
app = FastAPI()
class LoginRequest(BaseModel):
username: str
password: str
@app.post("/login")
async def login(request: LoginRequest):
stored_user = "admin"
if request.username.lower() == stored_user.lower():
# Authentication bypass possible
return {"message": "Welcome!"}
raise HTTPException(status_code=403, detail="Invalid credentials")This code appears secure, but Unicode normalization attacks can bypass it. The Turkish dotless 'i' (ı) and dotted 'i' (i) create a classic vulnerability. In Turkish locale, 'i'.lower() == 'ı', but 'I'.lower() == 'i'. An attacker could use a username like "ADMİN" (with dotted capital I) that normalizes differently than "admin".
Fastapi's Pydantic models compound this issue by automatically validating and converting input data. When Pydantic processes the request body, it doesn't normalize Unicode characters, passing them directly to your business logic:
class User(BaseModel):
username: str
email: str
@app.post("/create-user")
async def create_user(user: User):
# Unicode variations of same character may pass validation
if user.username == "admin":
raise HTTPException(status_code=400, detail="Username taken")
return {"message": "User created"}An attacker could register "admin" (full-width characters) or "аdmin" (Cyrillic 'a') which visually appear identical but are different Unicode code points. Fastapi's validation passes these through because they're valid UTF-8 strings.
Path parameter matching in Fastapi is another vulnerable area. The framework uses Starlette's routing, which performs exact string matching on URL paths:
@app.get("/users/me")
async def get_current_user():
return {"message": "This is your profile"}
@app.get("/users/{}")
async def get_user(user_id: str):
if user_id == "me":
# Unicode "me" could bypass this check
return await get_current_user()
return {"message": f"Profile for {user_id}"}An attacker could access another user's profile by using Unicode variations of "me" that normalize to the same value but aren't caught by the exact string comparison.
Query parameter handling suffers similar issues. Fastapi's dependency injection system passes raw query parameters to endpoint functions:
@app.get("/search")
async def search(
q: str = Query(..., description="Search query"),
case_sensitive: bool = Query(default=False)
):
if not case_sensitive:
q = q.lower() # Vulnerable to Unicode normalization
# Search logic here
return {"results": []}The lowercase conversion without Unicode normalization allows attackers to craft queries that bypass case-insensitive searches.
Fastapi-Specific Detection
Detecting Unicode normalization vulnerabilities in Fastapi applications requires examining both the code patterns and the runtime behavior. The framework's structure creates specific signatures that security scanners can identify.
Static analysis should look for these Fastapi-specific patterns:
# Vulnerable pattern: direct string comparison without normalization
@app.post("/sensitive")
async def sensitive_endpoint(username: str):
if username == "admin": # No normalization applied
return {"access": "granted"}
return {"access": "denied"}
# Vulnerable pattern: case conversion without Unicode awareness
@app.post("/login")
async def login(username: str, password: str):
if username.lower() == "admin": # Locale-dependent behavior
return {"authenticated": True}
return {"authenticated": False}
# Vulnerable pattern: Pydantic model comparisons
class UserModel(BaseModel):
username: str
@app.post("/register")
async def register(user: UserModel):
if user.username == "admin": # Direct comparison
raise HTTPException(400, "Username taken")
return {"status": "ok"}
Dynamic detection during runtime scanning reveals how Fastapi actually processes requests. When middleBrick scans a Fastapi application, it tests Unicode edge cases by sending requests with:
- Full-width Latin characters (e.g., "admin" vs "admin")
- Cyrillic homoglyphs (e.g., "аdmin" with Cyrillic 'a')
- Turkish dotless/dotted 'i' variations
- Combining character sequences (e.g., "é" vs "é")
- Zero-width characters and control sequences
The scanner analyzes Fastapi's response patterns to identify normalization bypasses. For authentication endpoints, it attempts login with Unicode variations and checks if any bypass the security checks. For user registration, it tries to register Unicode versions of reserved usernames.
middleBrick's Fastapi-specific scanning includes:
{
"fastapi_detection": {
"unicode_bypasses": [
{
"endpoint": "/login",
"attack_vector": "case_insensitive_bypass",
"payload": "ADMİN" (Turkish dotted I),
"result": "authentication_bypass_detected"
}
],
"pydantic_vulnerabilities": [
{
"model": "UserModel",
"field": "username",
"issue": "unicode_homoglyph_acceptance",
"severity": "high"
}
]
}
}The scanner also examines Fastapi's OpenAPI schema generation, which can reveal parameter validation weaknesses. If the generated schema shows loose validation rules for string parameters, this indicates potential Unicode acceptance issues.
Rate limiting bypass detection is particularly relevant for Fastapi applications using middleware. Unicode variations can create different cache keys:
# Vulnerable rate limiting
@app.middleware("http")
async def rate_limit_middleware(request, call_next):
key = request.client.host + request.path
# Unicode variations create different keys
cache.set(key, current_count)
# ... rate limiting logic
middleBrick tests whether Unicode variations allow attackers to bypass rate limits by creating distinct cache keys for visually identical requests.
Fastapi-Specific Remediation
Fastapi provides several native approaches to address Unicode normalization vulnerabilities, leveraging Pydantic's validation system and Fastapi's dependency injection framework.
The most effective remediation is implementing Unicode normalization at the Pydantic model level using custom validators:
import unicodedata
from fastapi import FastAPI
from pydantic import BaseModel, validator
class NormalizedUser(BaseModel):
username: str
email: str
@validator('username', 'email', pre=True)
def normalize_unicode(cls, value):
# NFC: Canonical Decomposition, followed by Canonical Composition
return unicodedata.normalize('NFC', value)
@validator('username', 'email')
def check_homoglyphs(cls, value):
# Check for Cyrillic/Latin confusables
if any('а' <= char <= 'я' for char in value.lower()):
raise ValueError('Cyrillic characters not allowed')
return value
app = FastAPI()
@app.post("/register")
async def register(user: NormalizedUser):
if user.username == "admin":
raise HTTPException(400, "Username taken")
return {"status": "ok"}
This approach ensures all incoming data is normalized before reaching your business logic. The NFC form is generally recommended as it composes characters where possible, reducing the attack surface.
For authentication endpoints, implement case-insensitive comparison with Unicode awareness:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import unicodedata
class LoginRequest(BaseModel):
username: str
password: str
app = FastAPI()
def normalize_case_insensitive(value: str) -> str:
# Casefold is more aggressive than lower() and handles Unicode better
return unicodedata.normalize('NFC', value.casefold())
@app.post("/login")
async def login(request: LoginRequest):
stored_user = "admin"
if normalize_case_insensitive(request.username) == normalize_case_insensitive(stored_user):
return {"message": "Welcome!"}
raise HTTPException(status_code=403, detail="Invalid credentials")
Fastapi's dependency injection system allows creating reusable normalization utilities:
from fastapi import Depends, HTTPException
from fastapi.security import OAuth2PasswordBearer
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")
def get_normalized_username(username: str = Depends()) -> str:
normalized = unicodedata.normalize('NFC', username.casefold())
if len(normalized) != len(username):
raise HTTPException(400, "Unicode normalization changed string length")
return normalized
@app.post("/secure-endpoint")
async def secure_endpoint(
username: str = Depends(get_normalized_username),
password: str
):
# username is now normalized and safe for comparison
pass
For API endpoints that must accept international characters but prevent homograph attacks, implement character set validation:
import re
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, validator
class SafeStringModel(BaseModel):
safe_text: str
@validator('safe_text')
def validate_safe_characters(cls, value):
# Allow only basic Latin characters, digits, and common symbols
if not re.match(r'^[ -~]+$', value):
raise ValueError('Only ASCII characters allowed')
return value
app = FastAPI()
@app.post("/safe-input")
async def safe_input(data: SafeStringModel):
return {"processed": data.safe_text}
Fastapi's middleware system can implement global normalization for all requests:
@app.middleware("http")
async def normalize_request_middleware(request: Request, call_next):
# Normalize query parameters
normalized_params = {
k: unicodedata.normalize('NFC', v)
if isinstance(v, str) else v
for k, v in request.query_params.multi_items()
}
# Normalize JSON body if present
body = await request.json()
if body:
body = {k: unicodedata.normalize('NFC', v) if isinstance(v, str) else v for k, v in body.items()}
# Create new request with normalized data
new_request = Request(
request.scope,
content=body
)
response = await call_next(new_request)
return response
For production deployments, combine these approaches with middleBrick's continuous monitoring to ensure Unicode normalization vulnerabilities don't reappear as your Fastapi application evolves.