Unicode Normalization in Fastapi with Cockroachdb
Unicode Normalization in Fastapi with Cockroachdb — how this specific combination creates or exposes the vulnerability
Unicode normalization inconsistencies between Fastapi request handling and Cockroachdb storage can lead to authentication bypass, data integrity issues, and information exposure. Fastapi typically receives UTF-8 encoded input, but depending on form parsing, JSON decoding, and header processing, the same logical string may have multiple binary representations (e.g., composed vs. decomposed). Cockroachdb, like most SQL databases, stores text as Unicode code points but does not automatically normalize unless explicitly configured. If an application compares user-supplied identifiers directly without normalization, visually identical strings that differ in code point composition may match in application logic but fail to match stored values, or worse, match an unintended account.
Consider an authentication flow where Fastapi receives a username or email, normalizes it to NFC, and uses it to query Cockroachdb. If the database contains a decomposed form (NFD) of the same visual string, the query returns no user. An attacker can exploit this by registering with a deliberately decomposed email to gain access to an account whose stored email is in composed form. Similar risks appear in BOLA/IDOR checks where resource identifiers are derived from user-controlled input that bypasses normalization before lookup. The risk is compounded when OpenAPI/Swagger specs define parameters as strings without normalization guidance, allowing non-normalized payloads to reach Cockroachdb and bypass authorization checks mapped to OWASP API Top 10 and GDPR data integrity expectations.
middleBrick scans this attack surface by testing unauthenticated endpoints and comparing runtime behavior against OpenAPI definitions, including $ref resolution across 2.0, 3.0, and 3.1 specs. It flags inconsistencies where input validation does not enforce a canonical normalization form and where downstream SQL behavior may diverge from application expectations. These findings align with compliance frameworks such as PCI-DSS and SOC2 controls that require consistent handling of identity data. By detecting missing normalization early, teams can prevent subtle identity confusion without relying on runtime blocking or firewall rules.
Cockroachdb-Specific Remediation in Fastapi — concrete code fixes
Remediation requires canonical normalization at the boundary where Fastapi accepts user input and before any data is persisted or compared against Cockroachdb. Use Python’s unicodedata module to normalize incoming strings to NFC (or consistent KD) across all endpoints, query parameters, path segments, and JSON bodies. Apply normalization before constructing SQL statements or ORM models that map to Cockroachdb tables. The following example demonstrates a Fastapi dependency that normalizes email and username fields and a repository function that safely interacts with Cockroachdb using parameterized queries to avoid injection while preserving normalized values.
import unicodedata
from typing import Optional
from fastapi import Depends, Fastapi, HTTPException, status
from pydantic import BaseModel, EmailStr
import asyncpg
app = Fastapi()
# Normalization helper
def normalize_unicode(value: str) -> str:
return unicodedata.normalize('NFC', value)
class UserCreate(BaseModel):
email: EmailStr
username: str
display_name: str
async def get_db_pool():
# Assume pool is initialized elsewhere
return asyncpg.create_pool(user='app', password='secret',
database='appdb', host='cockroachdb-host')
async def create_user(user: UserCreate, pool: asyncpg.Pool = Depends(get_db_pool)):
email_nfc = normalize_unicode(user.email)
username_nfc = normalize_unicode(user.username)
display_nfc = normalize_unicode(user.display_name)
async with pool.acquire() as conn:
# Use parameterized queries; Cockroachdb handles Unicode natively
await conn.execute(
'INSERT INTO users (email, username, display_name) VALUES ($1, $2, $3) ON CONFLICT (email) DO NOTHING',
email_nfc, username_nfc, display_nfc
)
# Verify insertion
row = await conn.fetchrow('SELECT id, email, username FROM users WHERE email = $1', email_nfc)
if row is None:
raise HTTPException(status_code=status.HTTP_409_CONFLICT, detail='User already exists or could not be created')
return {'id': row['id'], 'email': row['email'], 'username': row['username']}
@app.post('/register', status_code=status.HTTP_201_CREATED)
async def register(user: UserCreate, create_fn=create_user):
return await create_fn(user)
# Example query with normalization for authentication
async def authenticate(email: str, pool: asyncpg.Pool = Depends(get_db_pool)):
email_nfc = normalize_unicode(email)
async with pool.acquire() as conn:
row = await conn.fetchrow('SELECT id, email, username, password_hash FROM users WHERE email = $1', email_nfc)
return row
For existing data, plan a one-time migration to normalize stored values in Cockroachdb. Use a batched UPDATE that applies the same NFC normalization, ensuring that future queries remain consistent. Combine this with input validation layers in Fastapi (Pydantic validators or dependencies) to reject non-normalized payloads when strict compliance is required. middleBrick’s LLM/AI Security checks can verify that system prompt handling and output encoding do not reintroduce non-normalized data paths, especially when AI-assisted endpoints expose user-controlled strings to Cockroachdb.
In API specs, document the normalization requirement in parameter descriptions and request body schemas so that consumers understand canonical forms. middleBrick’s OpenAPI/Swagger analysis resolves $ref definitions and cross-references spec definitions with runtime findings, highlighting where documentation diverges from enforced normalization. This supports compliance mappings to OWASP API Top 10 and GDPR by demonstrating controlled identity data handling across Fastapi and Cockroachdb.