HIGH excessive data exposuremongodb

Excessive Data Exposure in Mongodb

How Excessive Data Exposure Manifests in MongoDB

Excessive Data Exposure in MongoDB-backed APIs occurs when an endpoint returns more data than the client requires, often including sensitive fields that should be restricted. This is a common implementation flaw, distinct from authorization bypasses—the user may have legitimate access to a resource, but the response payload unnecessarily leaks additional data.

In MongoDB applications, this typically manifests through several patterns:

Unprojected Queries: Using find() or findOne() without a projection document, returning all fields from the document. For example, a user profile endpoint that executes db.users.findOne({_id: userId}) will return the entire user document, including fields like passwordHash, ssn, or apiKeys even if the client only needs username and email.
Aggregation Pipeline Leaks: Aggregation frameworks like $lookup or $unwind can inadvertently expose data from joined collections. A query that joins orders with users might return a user's internal role or permissions field to a client that only needs order totals.
ORM/ODM Over-fetching: Libraries like Mongoose often default to returning full documents. A route like GET /api/users/:id that uses User.findById(req.params.id) without a .select() clause will serialize every schema field to JSON, including any non-select: false fields marked as sensitive in the schema.
GraphQL Resolvers: While not MongoDB-specific, GraphQL APIs backed by MongoDB can suffer from field-level over-fetching if resolvers fetch entire documents and let the GraphQL engine filter fields, rather than projecting only requested fields at the database layer.

These patterns violate the principle of least privilege at the data layer. An attacker who can call the endpoint—even with proper authentication—receives a superset of intended data, increasing the attack surface for data theft, credential stuffing, or social engineering.

Consider this vulnerable Express.js + Mongoose example:

app.get('/api/profile/:userId', async (req, res) => {
  const user = await User.findById(req.params.userId); // Returns ALL fields
  res.json(user);
});

If the User schema includes passwordResetTokens or billingAddress, these are serialized to the response regardless of client need.

MongoDB-Specific Detection

Detecting Excessive Data Exposure requires comparing the intended data contract (e.g., an OpenAPI specification) against actual runtime responses. middleBrick performs this analysis automatically during its 12 parallel security checks, specifically under its Data Exposure category.

The scanner works as follows:

Specification Analysis: If an OpenAPI/Swagger spec (2.0, 3.0, 3.1) is provided or discoverable, middleBrick parses the response schemas for each endpoint. It resolves all $ref pointers to understand which fields are documented as part of the API contract.
Runtime Response Inspection: The scanner sends requests to the target API (black-box, no credentials) and captures JSON responses. It then compares the actual response fields against the expected fields from the spec.
Anomaly Detection: Fields present in the response but absent from the spec—or fields marked as sensitive in common patterns (e.g., *password*, *secret*, *token*, *key*)—are flagged as potential excessive data exposure. The scanner also checks for common MongoDB-specific field names like __v (Mongoose version key), salt, or hash that may leak implementation details.

For example, if an OpenAPI spec defines a User response with only id, username, and email, but the actual API returns an extra passwordHash field, middleBrick will flag this as a high-severity finding.

You can reproduce this detection manually by:

Documenting expected response fields in an OpenAPI spec.
Using a tool like curl or Postman to capture a real response.
Diffing the response keys against the spec's schema.

However, middleBrick automates this at scale and correlates it with other risk categories (like Authentication or Input Validation) to produce a prioritized A–F score. The scanner's CLI tool (middlebrick scan <url>) and GitHub Action enable you to integrate this check into development workflows, catching over-fetching before deployment.

MongoDB-Specific Remediation

Remediation focuses on enforcing field-level restrictions at the database query layer. The goal is to fetch only the fields required by the client, using MongoDB's projection mechanisms and ORM/ODM features.

1. Use Projections in Native MongoDB Queries

Always include a projection document in find(), findOne(), and aggregation pipelines. The projection specifies which fields to include (1) or exclude (0).

// VULNERABLE: Returns all fields
db.users.findOne({ _id: ObjectId('...') });

// SECURE: Project only needed fields
db.users.findOne(
  { _id: ObjectId('...') },
  { username: 1, email: 1, _id: 0 } // Include only username, email; exclude _id
);

In aggregation, use $project to reshape documents early in the pipeline:

db.users.aggregate([
  { $match: { _id: ObjectId('...') } },
  { $project: { username: 1, email: 1, role: 1 } } // Explicitly list allowed fields
]);

2. Enforce Projections in Mongoose

Mongoose provides the .select() method to define projections. Combine it with query execution:

// VULNERABLE: Full document
const user = await User.findById(userId);

// SECURE: Only selected fields
const user = await User.findById(userId).select('username email role');

// Alternatively, use second argument in findById
const user = await User.findById(userId, 'username email role');

You can also set select: false for sensitive fields in the schema to exclude them by default:

const userSchema = new mongoose.Schema({
  username: String,
  email: String,
  passwordHash: { type: String, select: false } // Excluded unless explicitly selected
});

3. Apply Field-Level Encryption

For highly sensitive fields (e.g., SSN, credit cards), use MongoDB's Client-Side Field Level Encryption. Encrypted fields are stored as binary and only decrypted by clients with the appropriate key, preventing accidental exposure even if projected.

4. Implement Role-Based Access Control at the Database Level

Define MongoDB roles that restrict find or aggregate actions to specific collections and fields. While this is a defense-in-depth measure, application-layer projections remain primary.

5. Validate Against OpenAPI Specs

Use middleBrick's OpenAPI spec analysis to ensure your implementation matches the documented response schemas. The scanner resolves $refs and flags any runtime fields not present in the spec, helping you catch over-fetching early. Integrate the GitHub Action to fail PRs if new excessive fields appear.

Remediation guidance from middleBrick includes specific, actionable steps like "Add projection to this endpoint" with code snippets tailored to your stack (Express, Mongoose, etc.).

FAQ

How does middleBrick detect Excessive Data Exposure without credentials?
middleBrick sends unauthenticated requests to public endpoints and analyzes the JSON responses. It compares the actual fields returned against the expected fields defined in an OpenAPI spec (if available) or against known sensitive field patterns (e.g., password, token, secret). Any field that is both unexpected and sensitive is flagged as potential excessive exposure.
What's the difference between Excessive Data Exposure and an Injection vulnerability in MongoDB?
Excessive Data Exposure is about returning too much data in legitimate responses (over-fetching), while Injection (e.g., NoSQL Injection) involves manipulating queries to access unauthorized data. middleBrick checks both: Injection looks for unsanitized input altering query logic; Data Exposure checks response payloads for unnecessary sensitive fields. Both can lead to data leaks but require different fixes—input validation vs. query projections.

Related CWEs: propertyAuthorization

CWE ID	Name	Severity
CWE-915	Mass Assignment	HIGH