Unicode Normalization in Fiber with Mongodb
Unicode Normalization in Fiber with Mongodb — how this specific combination creates or exposes the vulnerability
Unicode Normalization becomes significant in Fiber applications that accept user input for string queries, identifiers, or document keys and then use those values to construct MongoDB queries. In a Fiber-based API, route or query parameters such as username or email may be directly forwarded to MongoDB operations. If the application does not normalize input and stored data consistently, visually identical strings can have different binary representations. For example, the character é can be represented as a single code point U+00E9 or as the two-code-point sequence e + combining acute accent U+0301. Without normalization, a query with one representation will not match a stored document using the other, leading to authentication bypass or incomplete data retrieval.
In a security context, inconsistent normalization can be abused in authentication or ID lookup paths. An attacker could supply a specially crafted Unicode string that bypasses account login if the comparison layer normalizes differently than the database index. If your Fiber routes rely on string-based lookups in MongoDB, such as db.users.findOne({ email: req.params.email }), and do not enforce a canonical normalization form, attackers may leverage normalization mismatches to gain unintended access or enumerate users.
Additionally, if your application stores user-controlled data in MongoDB and later renders that data in an HTML context or logs it without output encoding, normalization inconsistencies can contribute to injection-like behaviors or data corruption. For example, search filters that compare normalized input to unnormalized indexed values may behave erratically, causing unexpected filtering results or exposing sensitive entries. Because Fiber does not inherently normalize strings, developers must explicitly apply normalization before any MongoDB operation to ensure canonical representation across queries, indexes, and stored content.
MiddleBrick scans help surface these risks by checking how your API handles input validation and data exposure across endpoints that interact with databases. When scanning a Fiber endpoint that performs MongoDB lookups, the tool can detect whether normalization is applied consistently and flag findings related to authentication mismatches or data exposure.
Mongodb-Specific Remediation in Fiber — concrete code fixes
To mitigate Unicode normalization issues in Fiber with MongoDB, normalize all user-supplied strings before using them in queries or keys. Choose a canonical normalization form—NFC is commonly used in web applications—and apply it consistently in both incoming requests and when preparing data for storage or comparison.
Example: Normalizing input in a Fiber route before a MongoDB query
const { app } = require('@fastify/fiber');
const fiber = app();
const { MongoClient } = require('mongodb');
const punycode = require('punycode');
// Normalization helper using the built-in ICU normalizer in Node.js
function normalizeUnicode(str) {
return str.normalize('NFC');
}
fiber.get('/user/:username', async (req, reply) => {
const client = new MongoClient('mongodb://localhost:27017');
await client.connect();
const db = client.db('mydb');
const username = normalizeUnicode(req.params.username);
// Use normalized value in query to ensure consistent matching with stored data
const user = await db.collection('users').findOne({ username: username });
await client.close();
if (!user) {
return reply.status(404).send({ error: 'not_found' });
}
return reply.send(user);
});
// Also normalize when inserting to maintain canonical form
fiber.post('/user', async (req, reply) => {
const client = new MongoClient('mongodb://localhost:27017');
await client.connect();
const db = client.db('mydb');
const payload = req.body;
const normalizedPayload = {
...payload,
username: normalizeUnicode(payload.username),
email: normalizeUnicode(payload.email)
};
const result = await db.collection('users').insertOne(normalizedPayload);
await client.close();
return reply.status(201).send({ _id: result.insertedId });
});
For applications using Mongoose or other ODMs, apply normalization in pre-save hooks or before constructing queries:
const mongoose = require('mongoose');
const userSchema = new mongoose.Schema({
username: String,
email: String
});
userSchema.pre('save', function (next) {
if (this.isModified('username') || this.isNew) {
this.username = this.username.normalize('NFC');
}
if (this.isModified('email') || this.isNew) {
this.email = this.email.normalize('NFC');
}
next();
});
const User = mongoose.model('User', userSchema);
When indexing or searching, create collation or regex rules that account for normalization only if you cannot change stored data. However, the preferred approach is to store and query in a normalized form. MiddleBrick’s scans include checks for input validation and data exposure, highlighting whether endpoints consistently normalize identifiers and whether findings map to compliance frameworks such as OWASP API Top 10.
In CI/CD workflows, the middleBrick GitHub Action can be configured to fail builds if security scores drop due to input validation or data exposure findings, helping catch normalization-related regressions before deployment. The CLI allows you to scan endpoints from the terminal and review JSON output for precise guidance.