Unicode Normalization in Adonisjs with Cockroachdb
Unicode Normalization in Adonisjs with Cockroachdb — how this specific combination creates or exposes the vulnerability
Unicode normalization inconsistencies between Adonisjs string handling and Cockroachdb collation can create security-relevant behavior, particularly around identifier comparison and canonicalization. When an Adonisjs application receives user input containing Unicode characters (e.g., usernames, API keys, or identifiers) and persists them to Cockroachdb, the database may store a different binary representation depending on the normalization form used at insert time. Cockroachdb typically uses Unicode collation elements (UCA) and does not automatically normalize strings to a canonical form, while Adonisjs may normalize strings in application layer using Node.js built-in APIs such as String.prototype.normalize. This mismatch can lead to multiple logically equivalent strings being stored as distinct values, enabling bypasses that rely on equivalence confusion, such as duplicate accounts, inconsistent authorization checks, or injection-like behavior in search or lookup logic.
Consider authentication and identifier-based authorization checks. If an access control check in Adonisjs normalizes an identifier before comparison (e.g., NFC), but the stored Cockroachdb value is in a different form (e.g., NFD), the comparison may incorrectly evaluate as a match for the wrong entity or fail to match the intended entity. In the context of BOLA/IDOR, this can allow horizontal privilege escalation when a user can access another user’s resource by supplying a visually identical but differently normalized identifier. Attackers can craft inputs that exploit normalization differences to bypass uniqueness constraints or enumeration protections, potentially enabling account creation under a normalized-equivalent string that appears identical to the user but differs at the byte level in Cockroachdb.
Input validation and property authorization checks are also affected. Adonisjs validation rules may enforce format constraints using regexes that assume a single canonical representation, while Cockroachdb stores the raw submitted value. This can lead to situations where an input passes application-side validation but matches stored data unexpectedly, complicating secure filtering and querying. In data exposure checks, queries that rely on string equality without explicit normalization can return more rows than intended if multiple normalization forms exist, increasing the risk of inadvertently exposing sensitive records. For example, searching for a username using a normalized form may not match a stored non-normalized or differently normalized variant, causing the application to handle fallback flows that expose more data than expected.
SSRF and unsafe consumption patterns can be influenced as well if user-controlled strings containing Unicode are used to construct URLs or file paths that are later stored in Cockroachdb and later used by backend services. An attacker could submit a specially crafted string that normalizes to a semantically different sequence after being processed by the application and then stored, leading to unexpected resolution paths when the value is later used. Similarly, inventory management and unsafe consumption checks that rely on exact string matching without normalization can treat equivalent entries as distinct, undermining consistency checks and allowing duplicate or conflicting entries to exist across the system.
Cockroachdb-Specific Remediation in Adonisjs — concrete code fixes
To mitigate normalization-related issues when using Adonisjs with Cockroachdb, enforce a single normalization form consistently at the point of input and before any comparison or storage. Use Node.js’s built-in normalization to convert all incoming string values to a canonical form (NFC is commonly recommended) before validation, storage, and comparison. This ensures that regardless of how the client submits the data, the application and database operate on a consistent representation.
When storing strings in Cockroachdb, apply normalization in Adonisjs before passing values to the ORM. For example, normalize user-provided identifiers and usernames before creating or querying records:
import { normalize } from 'node:unicodedata';
// Normalize to NFC before validation and persistence
const normalizedUsername = normalize(usernameInput, 'NFC');
const user = await User.create({
username: normalizedUsername,
email: emailInput,
});
For queries that involve user-supplied identifiers, normalize the lookup value to the same form used during insertion to ensure accurate matches:
const normalizedSearchKey = normalize(request.input('key'), 'NFC');
const result = await User.query()
.where('username', normalizedSearchKey)
.first();
When using raw SQL in Adonisjs (e.g., via the Lucid ORM or knex) with Cockroachdb, you can also apply normalization in-database using built-in functions if needed, but it is generally safer to normalize in the application layer to avoid collation-dependent behavior. For example, with Cockroachdb’s supported SQL functions, you could normalize using unaccent-like patterns in combination with application logic, but explicit application-layer normalization is more portable and predictable across environments.
For authorization and equality checks, avoid relying on database collation to handle equivalence. Instead, normalize both sides of the comparison. If you maintain a mapping table (e.g., aliases or identifiers), store normalized keys and normalize inputs at lookup time:
const normalizedAlias = normalize(aliasInput, 'NFC');
const mapping = await UserAlias.query()
.where('alias_normalized', normalizedAlias)
.first();
In validation schemas, ensure that rules do not inadvertently accept multiple normalization forms if uniqueness is required. Combine normalization with uniqueness constraints at the application level and, where supported, enforce canonical forms in database constraints. This reduces the risk of bypasses via visually identical but differently encoded strings. Regularly audit stored values for unexpected normalization variants and reconcile them if necessary to maintain consistency and prevent privilege escalation or data exposure edge cases.