MEDIUM unicode normalizationginfirestore

Unicode Normalization in Gin with Firestore

Q: How can I verify my Gin endpoints normalize inputs for Firestore?

Use middleBrick’s CLI to scan your endpoints: run middlebrick scan <your-api-url> . The scan tests input handling and can highlight inconsistencies where equivalent Unicode inputs produce different Firestore references, supporting BOLA/IDOR and Input Validation checks.

Unicode Normalization in Gin with Firestore — how this specific combination creates or exposes the vulnerability

Unicode normalization inconsistencies arise when Gin handles user-supplied strings that are sent to Firestore. Firestore stores strings as UTF-8 but does not enforce a canonical normalization form. If Gin routes or queries based on user input without normalizing to a consistent form (NFC, NFD, NFKC, or NFKD), equivalent strings may map to different Firestore document IDs or field values, enabling bypasses of access control or data leakage.

For example, consider a Gin endpoint that retrieves a Firestore document using a user-provided identifier such as a username or key. The path /users/:username might look up a document ID directly derived from the parameter. If one request sends café (U+00E9) and another sends café (U+0065 + combining acute), they normalize to the same visual string but are distinct byte sequences. Firestore treats these as separate document IDs or index keys, so authorization checks tied to one representation may not apply to the other. This mismatch can lead to BOLA/IDOR-like access across user boundaries when document IDs or Firestore security rules rely on unnormalized input.

In Firestore, index entries and document ID comparisons are binary-sensitive. A composite index that includes a string field may treat normalized and non-normalized variants as distinct, causing queries to return incomplete or inconsistent results. An attacker could exploit this by submitting carefully crafted Unicode variants to reach data they should not see, or to trigger validation rules that do not account for normalization equivalence. The Gin layer, responsible for binding path and query parameters, must normalize inputs before using them for Firestore document lookups, key generation, or rule evaluation to prevent these discrepancies.

Additionally, Firestore security rules evaluate string comparisons as exact binary matches. If a rule compares a UID or tenant identifier that arrives in a non-normalized form, a rule intended to restrict access may fail to match the normalized ID stored in Firestore, inadvertently granting access. Regular expressions and prefix matches in rules can also behave differently under varying normalization forms, further widening the gap between intended and actual enforcement. Therefore, applying a canonical normalization (typically NFC) in Gin before constructing Firestore references is essential to align user input with stored data and rule conditions.

Real-world impact resembles confusion-based access bypass patterns documented in OAuth and IDOR scenarios, where equivalent identifiers break authorization boundaries. While not a classic injection, normalization mismatches can facilitate privilege escalation or information disclosure when combined with Firestore’s rule evaluation logic. MiddleBrick’s checks for BOLA/IDOR and Input Validation highlight these risks by correlating runtime behavior with specification definitions, including how path and query parameters interact with Firestore document references.

Firestore-Specific Remediation in Gin — concrete code fixes

To remediate Unicode normalization issues in Gin when working with Firestore, normalize all incoming string parameters before using them to build document references, keys, or rule-evaluated values. Use a standard Unicode normalization library to convert inputs to a canonical form such as NFC. This ensures consistent handling across requests and alignment with Firestore’s storage and indexing behavior.

Below is a concrete example in Go using the golang.org/x/text/unicode/norm package to normalize a username parameter before looking up a Firestore document. The snippet shows how to integrate normalization safely within a Gin route handler:

import (
    "github.com/gin-gonic/gin"
    "golang.org/x/text/unicode/norm"
    "cloud.google.com/go/firestore"
    "context"
    "strings"
)

func GetUserHandler(client *firestore.Client) gin.HandlerFunc {
    return func(c *gin.Context) {
        username := c.Param("username")
        // Normalize to NFC to ensure canonical representation
        normalized := norm.NFC.String(username)
        // Optionally trim and validate format after normalization
        normalized = strings.TrimSpace(normalized)
        if normalized == "" {
            c.AbortWithStatusJSON(400, gin.H{"error": "invalid username"})
            return
        }
        docRef := client.Collection("users").Doc(normalized)
        var resp map[string]interface{}
        err := docRef.Get(c, &resp)
        if err != nil {
            c.AbortWithStatusJSON(404, gin.H{"error": "not found"})
            return
        }
        c.JSON(200, resp)
    }
}

This approach ensures that requests for café and café resolve to the same Firestore document path. It also pairs well with Firestore security rules that use the same normalized value for comparisons, reducing the risk of bypass due to representation differences.

For query parameters that filter Firestore collections, apply normalization before constructing query constraints. In scenarios where Firestore field values are expected to match normalized input, ensure data is stored in a normalized form or perform normalization on stored values during reads. The following example demonstrates normalizing a query token used in a where clause:

func SearchItemsHandler(client *firestore.Client) gin.HandlerFunc {
    return func(c *gin.Context) {
        token := c.Query("tag")
        normalizedTag := norm.NFC.String(token)
        iter := client.Collection("items").
            Where("tags", "array-contains", normalizedTag).
            Limit(10).Documents(c)
        var results []map[string]interface{}
        for {
            doc, err := iter.Next()
            if err != nil {
                break
            }
            var data map[string]interface{}
            doc.DataTo(&data)
            results = append(results, data)
        }
        c.JSON(200, results)
    }
}

When designing Firestore security rules, align rule logic with the normalization strategy used in Gin. Rules should compare normalized fields or apply normalization-equivalent logic to prevent mismatches. For example, if usernames are stored in NFC, rules that evaluate ownership should normalize the request resource name or identifier before comparison. MiddleBrick’s scans for Property Authorization and Input Validation can surface gaps between rule expectations and runtime behavior, especially when Firestore paths or rule conditions depend on raw user input.

In environments using OpenAPI specifications, ensure path and query parameter schemas indicate normalization expectations or validate format post-normalization. While Firestore does not enforce schema-level normalization, documenting and enforcing NFC at the API layer reduces inconsistency. The CLI tool (middlebrick scan <url>) can be used to validate that endpoints handling Firestore-bound parameters include normalization logic, and the GitHub Action can enforce this as part of CI/CD gates.

Frequently Asked Questions

Why does Unicode normalization matter when using Gin with Firestore?

Firestore stores strings as UTF-8 and treats different byte representations of canonically equivalent text as distinct. If Gin does not normalize inputs (e.g., NFC) before using them for document IDs, keys, or rule evaluations, equivalent usernames or identifiers can map to different Firestore entries, leading to authorization bypasses or inconsistent query results.

How can I verify my Gin endpoints normalize inputs for Firestore?

Use middleBrick’s CLI to scan your endpoints: run middlebrick scan <your-api-url>. The scan tests input handling and can highlight inconsistencies where equivalent Unicode inputs produce different Firestore references, supporting BOLA/IDOR and Input Validation checks.

Unicode Normalization in Gin with Firestore

Unicode Normalization in Gin with Firestore — how this specific combination creates or exposes the vulnerability

Firestore-Specific Remediation in Gin — concrete code fixes

Frequently Asked Questions

Related Pages