HIGH unicode normalizationaxum

Unicode Normalization in Axum

How Unicode Normalization Manifests in Axum

Unicode normalization vulnerabilities in Axum typically emerge through path traversal and authentication bypass attempts. When Axum extracts path parameters using Path or Query extractors, Unicode characters can be represented in multiple equivalent forms, creating opportunities for bypassing security checks.

The most common manifestation occurs with path traversal attacks. Consider an endpoint that validates file paths:

async fn download_file(
    Path((username, filename)): Path<(String, String)>,
) -> impl IntoResponse {
    let safe_path = format!("/data/{}/{}", username, filename);
    // ... file serving logic
}

An attacker can bypass path validation by using Unicode precomposed characters. For example, the filename "/etc/passwd" can be encoded as "/étc/passwd" (where é is the precomposed e-acute). Axum's path extractor will normalize this to the decomposed form "/etc/passwd", allowing directory traversal.

Authentication bypass through Unicode normalization is another critical vector. Many Axum applications use case-insensitive username matching:

async fn authenticate(
    Json(credentials): Json<Credentials>,
) -> Result<impl IntoResponse> {
    let user = User::find_by_username(&credentials.username)?;
    // ... authentication logic
}

Using Unicode case folding, an attacker can craft usernames that appear identical to valid ones but bypass exact string matching. For instance, the username "admin" can be represented using Cyrillic "а" characters (U+0430) instead of Latin "a" (U+0061), creating visually identical but distinct strings.

API endpoints that process internationalized domain names (IDN) are particularly vulnerable. Axum applications handling IDN URLs must properly normalize Unicode to prevent homograph attacks where visually similar characters represent different Unicode code points.

Axum-Specific Detection

Detecting Unicode normalization issues in Axum applications requires both static analysis and runtime testing. middleBrick's black-box scanning approach is particularly effective for Axum APIs because it tests the actual HTTP endpoints without requiring source code access.

For path traversal detection, middleBrick tests Axum endpoints with Unicode variants of common traversal patterns:

async fn test_unicode_traversal(client: &Client) {
    let test_cases = vec![
        ("/etc/passwd", "/étc/passwd"),
        ("/../secret", "/.%2e/%2e%2e/secret"),
        ("/admin", "/ádmin"),
    ];
    
    for (original, unicode_variant) in test_cases {
        let response = client.get(unicode_variant).send().await;
        // Check if response differs from expected behavior
    }
}

middleBrick's LLM/AI security module also detects Unicode-related prompt injection attempts in Axum applications that serve AI endpoints. The scanner tests for Unicode variations in system prompts and instruction injection payloads.

Property authorization issues in Axum often involve Unicode field names. middleBrick's OpenAPI analysis cross-references endpoint definitions with runtime testing to identify properties that accept Unicode characters but lack proper validation:

#[derive(Deserialize)]
struct UserData {
    #[serde(rename = "émail")]
    email: String,
    // ... other fields
}

The scanner verifies whether these Unicode-renamed fields are properly authorized and validated, as improper handling can lead to data exposure or privilege escalation.

Axum-Specific Remediation

Remediating Unicode normalization vulnerabilities in Axum requires a multi-layered approach. The first layer is consistent Unicode normalization using Rust's unicode-normalization crate:

use unicode_normalization::UnicodeNormalization;

async fn safe_path_handler(
    Path((username, filename)): Path<(String, String)>,
) -> impl IntoResponse {
    // Normalize to NFC form before processing
    let normalized_username = username.nfc().collect::>();
    let normalized_filename = filename.nfc().collect::>();
    
    let safe_path = format!("/data/{}/{}", normalized_username, normalized_filename);
    
    // Additional validation
    if normalized_filename.contains("..") {
        return (Status::BadRequest, "Invalid filename").into_response();
    }
    
    // ... file serving logic
}

For authentication endpoints, implement Unicode-aware case folding using unicode-casefold:

use unicode_casefold::UnicodeCaseFold;

async fn authenticate(
    Json(credentials): Json<Credentials>,
) -> Result<impl IntoResponse> {
    let normalized_username = credentials.username.to_casefold();
    let user = User::find_by_username_casefold(&normalized_username)?;
    
    // Compare case-folded passwords
    let normalized_password = credentials.password.to_casefold();
    if user.password_hash_matches(&normalized_password) {
        // ... authentication success
    }
    
    Ok((Status::Unauthorized, "Invalid credentials").into_response())
}

Axum's middleware system provides an excellent place to implement global Unicode normalization:

use axum::middleware::Next;

async fn unicode_normalization_middleware(
    mut req: Request,
    next: Next,
) -> Result<impl IntoResponse> {
    // Normalize query parameters
    let query = req.uri().query().unwrap_or("");
    let normalized_query = query.nfc().collect::>();
    
    // Normalize path segments
    let path = req.uri().path();
    let normalized_path = path.nfc().collect::>();
    
    // Rebuild request with normalized components
    let new_uri = Uri::builder()
        .scheme(req.uri().scheme())
        .authority(req.uri().authority())
        .path_and_query(format!("{}", normalized_path).parse()?)
        .build()?;
    
    *req.uri_mut() = new_uri;
    
    next.run(req).await
}

For API endpoints handling internationalized data, implement comprehensive validation using libraries like idna for domain names and chardet for character encoding detection:

use idna::domain_to_ascii;

async fn handle_idn_request(
    Path(domain): Path<String>,
) -> impl IntoResponse {
    match domain_to_ascii(&domain) {
        Ok(ascii_domain) => {
            // Process valid ASCII domain
        }
        Err(_) => {
            return (Status::BadRequest, "Invalid domain name").into_response();
        }
    }
}

Frequently Asked Questions

How does Unicode normalization affect Axum's path parameter extraction?
Axum's path parameter extraction doesn't automatically normalize Unicode characters, which can lead to security issues. When a client requests "/étc/passwd" and your endpoint expects "/etc/passwd", Axum will extract different string values. This breaks authentication and authorization logic that relies on exact string matching. Always normalize path parameters to NFC form immediately after extraction to ensure consistent handling.
Can middleBrick detect Unicode normalization vulnerabilities in my Axum API?
Yes, middleBrick's black-box scanning tests Axum endpoints with Unicode variants of common attack patterns. The scanner sends requests with precomposed and decomposed Unicode characters to identify path traversal, authentication bypass, and data exposure vulnerabilities. middleBrick also analyzes your OpenAPI spec to identify properties that accept Unicode input but lack proper validation, then tests these endpoints with Unicode payloads to verify security controls.