MEDIUM unicode normalizationchicockroachdb

Unicode Normalization in Chi with Cockroachdb

Unicode Normalization in Chi with Cockroachdb — how this specific combination creates or exposes the vulnerability

Unicode normalization inconsistencies between HTTP request handling in Chi and storage behavior in Cockroachdb can lead to authentication bypass, data exposure, and duplicate records. Chi routes are pattern-matched on decoded path segments, and if normalization is applied inconsistently between the router layer and the database layer, semantically identical identifiers can appear different to the application and to Cockroachdb.

For example, a user identifier containing an accented character can be submitted in composed form (é, U+00E9) via an HTTP request. Chi decodes and matches the route successfully, but if the application does not normalize before writing to Cockroachdb, the value is stored in a different binary representation. On subsequent requests, normalization or comparison logic may treat the two representations as unequal, bypassing intended access controls or allowing duplicate entries that violate uniqueness constraints.

Input validation checks in middleBrick’s checks such as Input Validation and Property Authorization highlight these risks by comparing runtime values against expected canonical forms. Without normalization, an API that relies on string equality in WHERE clauses in Cockroachdb may return multiple rows or no rows depending on how the client sends the data, leading to information leakage or privilege escalation (BOLA/IDOR).

SSRF and Data Exposure checks can also be triggered when normalization issues cause the application to resolve internal identifiers inconsistently, potentially exposing backend references or paths that should remain internal. Because Cockroachdb stores string values as provided (subject to collation rules), normalization must be enforced at the application layer before values are used in queries or compared to user input.

Cockroachdb-Specific Remediation in Chi — concrete code fixes

Remediation requires normalizing Unicode input in Chi before using values in SQL queries against Cockroachdb. Use a consistent normalization form—NFC is commonly chosen for storage—so that comparisons and constraints behave deterministically.

Example: a Chi route that accepts a user handle and retrieves a profile from Cockroachdb.

// Before: vulnerable to normalization mismatch
// chi is imported and a router is defined
func getUserProfile(w http.ResponseWriter, r *http.Request) {
    vars := chi.URLParams(r)
    handle := vars["handle"]
    var profile Profile
    // Direct concatenation risks normalization mismatch
    row := db.QueryRow(r, "SELECT id, display_name FROM profiles WHERE handle = $1", handle)
    if err := row.Scan(&profile.ID, &profile.DisplayName); err != nil {
        http.Error(w, "not found", http.StatusNotFound)
        return
    }
    json.NewEncoder(w).Encode(profile)
}

// After: normalize before database interaction
import (
    "golang.org/x/text/unicode/norm"
    "strings"
)

const normalizationForm = "NFC"

func normalizeUnicode(s string) string {
    return strings.TrimSpace(norm.NFC.String(s))
}

func getUserProfileSafe(w http.ResponseWriter, r *http.Request) {
    vars := chi.URLParams(r)
    handle := normalizeUnicode(vars["handle"])
    var profile Profile
    row := db.QueryRow(r, "SELECT id, display_name FROM profiles WHERE handle = $1", handle)
    if err := row.Scan(&profile.ID, &profile.DisplayName); err != nil {
        http.Error(w, "not found", http.StatusNotFound)
        return

For bulk operations or upserts, apply normalization to all incoming identifiers before constructing SQL statements.

func upsertProfile(w http.ResponseWriter, r *http.Request) {
    var req struct {
        Handle    string `json:"handle"`
        FullName  string `json:"full_name"`
    }
    if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
        http.Error(w, "bad request", http.StatusBadRequest)
        return
    }
    handle := normalizeUnicode(req.Handle)
    fullName := normalizeUnicode(req.FullName)
    _, err := db.Exec(r.Context(), `
        INSERT INTO profiles (handle, display_name) 
        VALUES ($1, $1) 
        ON CONFLICT (handle) DO UPDATE SET display_name = $2`, handle, fullName)
    if err != nil {
        http.Error(w, "server error", http.StatusInternalServerError)
        return
    }
    w.WriteHeader(http.StatusOK)
}

Additionally, ensure that any indexes and unique constraints in Cockroachdb are defined on the normalized form. If legacy data exists in mixed normalization forms, backfill data using a migration that applies the same normalization function.

-- Normalize existing handles in Cockroachdb
UPDATE profiles SET handle = NFC(handle);
-- Then create or verify a unique index on the normalized column
CREATE UNIQUE INDEX IF NOT EXISTS idx_profiles_handle_normalized ON profiles (handle);

middleBrick’s LLM/AI Security and Input Validation checks can be used to verify that normalization is consistently applied across endpoints and that no raw, unnormalized input is accepted by the database layer.

Frequently Asked Questions

Why does Unicode normalization matter when using Chi and Cockroachdb?
Chi performs route pattern matching on decoded URL path segments, while Cockroachdb stores string values as provided (subject to collation). Without normalization, semantically identical strings can have different binary representations, causing mismatches in equality comparisons, bypassing uniqueness constraints, and enabling authentication bypass or duplicate records.
Can normalization be enforced at the database level instead of in Chi?
Cockroachdb does not provide built-in automatic Unicode normalization for incoming values. Normalization must be applied in the application layer (e.g., in Chi handlers) before values are sent to Cockroachdb. Database-side constraints can then rely on consistently normalized values for indexes and unique checks.