Unicode Normalization in Chi with Jwt Tokens
Unicode Normalization in Chi with Jwt Tokens — how this specific combination creates or exposes the vulnerability
Chi is an HTTP client for Elixir that is commonly used to call external APIs, including OAuth2 introspection and JWT validation endpoints. When Chi is used to fetch or verify JSON Web Tokens, subtle encoding differences in Unicode identifiers can lead to inconsistent parsing and unexpected behavior. Unicode normalization becomes relevant when a JWT contains claims or headers with non-ASCII characters—for example, email addresses with accents, usernames using Latin-1 supplements, or directory names in Asian scripts. If the token payload includes characters like é (U+00E9) and the application compares it after normalization to a precomputed value without applying a canonical form, the comparison may succeed or fail depending on how Chi or the underlying library handles string representation. This inconsistency can expose an authentication bypass or token confusion risk if the normalized and non-normalized forms are treated as equivalent by business logic but not by cryptographic verification.
JWT parsing libraries in Elixir, such as joken or manual JOSE decoders, typically work on binaries. If Chi retrieves a JWK Set or introspects a token over HTTPS and passes the JSON body directly to a parser, any Unicode normalization applied only on the client side can create a mismatch between the expected and actual claims. For instance, a token issued with a normalized form (NFC) might be compared against a database key stored in NFD, leading to false negatives in claim validation. Attackers can exploit this by registering or manipulating identities with homoglyphs—characters that look similar but have different code points—to bypass allowlists that rely on exact string matching. Because Chi is often used in backend microservices that trust the JWT after verification, inconsistent handling of normalization across the stack can weaken the effective security boundary, especially when input validation and identity checks are not consistently normalized before comparison.
Additionally, header fields like kid (Key ID) may include non-ASCII metadata in some custom implementations, and if Chi forwards or logs these values without normalization, it can lead to inconsistent audit trails or injection through encoding mismatches. Although JWT specifications recommend treating the token as an opaque string for signature verification, the claims set is often processed programmatically. If normalization is applied selectively—such as only on email or username claims but not on roles or scopes—an attacker can craft a token where a critical claim bypasses authorization checks due to a normalization discrepancy. MiddleBrick’s checks for Input Validation and Property Authorization are designed to surface such inconsistencies by correlating runtime behavior with schema expectations, helping teams identify whether normalization is handled consistently across the API surface that interacts with JWTs.
Jwt Tokens-Specific Remediation in Chi
To remediate Unicode normalization issues when using Chi with JWTs, enforce canonical normalization before any comparison, storage, or logging. Use a well-maintained Unicode library in Elixir, such as unicode_util or functions from the :unicode module, to normalize strings to a single form—typically NFC or NFKC—before validation. Apply normalization consistently across all claims that are subject to access control, including email, username, roles, scopes, and any custom identifiers. Avoid relying on exact binary equality for user-controlled values that may contain international characters.
Code example: Normalizing claims in Chi-based JWT validation
# mix.exs: add {:unicode_util, "~> 0.1.0"} or use :unicode functions directly
# Normalize incoming JWT claims before comparison
defmodule MyApp.JwtValidator do
import Unicode.Util, only: [nfc: 1]
def validate_claims(token, expected_claims) do
with {:ok, claims} <- extract_claims(token),
normalized_claims <- normalize_claims(claims),
true <- compare_claims(normalized_claims, expected_claims) do
{:ok, normalized_claims}
else
_ -> {:error, :invalid_claims}
end
end
defp extract_claims(token) do
# Use Joken or JOSE for decoding; this example assumes manual parsing
case Jason.decode(token) do
{:ok, %{"claims" => claims}} -> {:ok, claims};
_ -> {:error, :invalid_token}
end
end
defp normalize_claims(claims) do
Enum.into(claims, %{}, fn {k, v} ->
{k, normalize_value(v)}
end)
end
defp normalize_value(value) when is_binary(value) do
nfc(value)
end
defp normalize_value(value) when is_list(value) do
Enum.map(value, &normalize_value/1)
end
defp normalize_value(value), do: value
defp compare_claims(actual, expected) do
Map.equal?(actual, expected)
end
end
When using Chi to fetch JWK Sets or introspect tokens, ensure that any string values extracted from the JSON response are normalized before being used in authorization decisions. For example, if you retrieve a JWK with a kid that contains non-ASCII metadata, normalize it before lookup:
# Using Chi to fetch JWK Set and normalize kid values
chi_url = "https://auth.example.com/.well-known/jwks.json"
{:ok, response} = Tesla.get(chi_url, [], hackney_opts: [])
jwks =
case Jason.decode(response.body) do
{:ok, %{"keys" => keys}} ->
Enum.map(keys, fn key ->
normalized_kid = Unicode.Util.nfc(key["kid"] || "")
Map.put(key, "kid", normalized_kid)
end)
_ -> []
end
In your validation pipeline, integrate normalization as an early step so that tokens with homoglyphs are treated consistently. Combine this with schema-based checks for the claims set and enforce allowlists using normalized values. Complement these code-level fixes with runtime verification using tools like MiddleBrick’s scans, which can detect inconsistencies between your OpenAPI contract and actual behavior, including how non-ASCII inputs are handled in authentication flows.