HIGH xml external entitiesmongodb

Xml External Entities in Mongodb

How Xml External Entities Manifests in Mongodb

Xml External Entities (XXE) attacks in MongoDB contexts typically occur when XML data flows through MongoDB's document processing pipelines. While MongoDB stores data in BSON format rather than XML, XXE vulnerabilities emerge when applications accept XML input that gets processed before storage or after retrieval.

The most common MongoDB XXE scenario involves applications that accept XML configuration files or user-provided XML data that gets parsed by a vulnerable XML parser before being stored in MongoDB. For example, a document management system might accept XML metadata for documents stored in MongoDB:

// Vulnerable XML parsing before MongoDB storage
const xml2js = require('xml2js');
const { MongoClient } = require('mongodb');

async function storeDocumentMetadata(xmlData) {
  const parser = new xml2js.Parser({
    // Dangerous: allows external entities by default
    xmlns: true,
    explicitRoot: false
  });
  
  const parsedData = await parser.parseStringPromise(xmlData);
  
  const client = await MongoClient.connect('mongodb://localhost:27017');
  const db = client.db('documents');
  
  await db.collection('metadata').insertOne({
    xmlContent: parsedData,
    timestamp: new Date()
  });
  
  await client.close();
}

In this vulnerable pattern, an attacker could craft XML containing external entity declarations:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>
  <content>&xxe;</content>
</root>

When this XML is parsed by the vulnerable xml2js parser, it would read the /etc/passwd file contents and store them in MongoDB. The attack extends to SSRF scenarios where external entities point to internal services:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://internal-service:8080/secret">
]>
<root>
  <content>&xxe;</content>
</root>

Another MongoDB-specific XXE vector occurs in applications using XML-based data import/export with MongoDB. Consider a migration tool that processes XML dumps:

import xml.etree.ElementTree as ET
import pymongo

def import_xml_to_mongodb(xml_file, db_name):
    tree = ET.parse(xml_file)  # Vulnerable: no entity resolution disabled
    root = tree.getroot()
    
    client = pymongo.MongoClient('mongodb://localhost:27017')
    db = client[db_name]
    
    for record in root.findall('record'):
        data = {child.tag: child.text for child in record}
        db.collection('imported').insert_one(data)
    
    client.close()

This code uses Python's default XML parser without disabling entity expansion, making it vulnerable to XXE attacks during import operations.

Mongodb-Specific Detection

Detecting XXE vulnerabilities in MongoDB-integrated applications requires examining both the XML processing code and the data flow patterns. The first detection step is identifying XML parsing operations that occur before or after MongoDB interactions.

Static code analysis should flag these dangerous patterns:

# Search for vulnerable XML parsing patterns
# Node.js with xml2js (dangerous default settings)
grep -r "xml2js" . --include="*.js" | grep -v "-noent"

# Python with xml.etree.ElementTree (default vulnerable)
grep -r "ElementTree" . --include="*.py" | grep -v "resolve_entities"

# Java with DocumentBuilderFactory (needs secure processing)
grep -r "DocumentBuilderFactory" . --include="*.java" | grep -v "setFeature"

Runtime detection focuses on identifying XML processing in application logs and monitoring for unusual patterns. Tools like middleBrick can scan API endpoints that accept XML data, testing for XXE vulnerabilities by attempting controlled external entity injections.

middleBrick's XXE detection methodology includes:

  • Testing XML endpoints with crafted payloads containing external entities
  • Monitoring for successful entity resolution or error messages that reveal parser behavior
  • Checking for SSRF-like behavior when external entities point to internal services
  • Analyzing response times that might indicate file access attempts

For MongoDB-specific detection, examine the application's data ingestion pipelines. Look for:

// Dangerous: no XML validation before storage
app.post('/upload-xml', async (req, res) => {
  const xmlData = req.body.xml;
  // No validation, no sanitization
  await collection.insertOne({ xml: xmlData });
});

Network monitoring can also detect XXE attempts by observing outbound connections from your application servers when processing XML data. Unusual DNS queries or HTTP requests to unexpected destinations may indicate successful XXE exploitation.

Database-level detection involves monitoring MongoDB collections for suspicious data patterns. XXE attacks often result in unexpected data structures or content that deviates from normal application behavior. Implement alerting on:

  • Large insertions of XML data from single sources
  • Unexpected file path patterns in stored data
  • Anomalous access patterns to MongoDB collections that handle XML data

Mongodb-Specific Remediation

Remediating XXE vulnerabilities in MongoDB-integrated applications requires securing XML processing before data reaches the database. The most effective approach is disabling external entity processing at the parser level.

For Node.js applications using xml2js:

const { MongoClient } = require('mongodb');

async function secureStoreDocumentMetadata(xmlData) {
  const parser = new xml2js.Parser({
    // Secure configuration
    xmlns: false,
    explicitRoot: false,
    // Explicitly disable external entities
    strict: true,
    // Additional security options
    mergeAttrs: true,
    explicitArray: false
  });
  
  try {
    const parsedData = await parser.parseStringPromise(xmlData);
    
    const client = await MongoClient.connect('mongodb://localhost:27017');
    const db = client.db('documents');
    
    await db.collection('metadata').insertOne({
      xmlContent: parsedData,
      timestamp: new Date()
    });
    
    await client.close();
    return { success: true };
  } catch (error) {
    // Handle parsing errors securely
    return { success: false, error: 'Invalid XML format' };
  }
}

For Python applications using xml.etree.ElementTree:

import xml.etree.ElementTree as ET
import pymongo
from defusedxml.ElementTree import parse

def secure_import_xml_to_mongodb(xml_file, db_name):
    # Use defusedxml to prevent XXE
    tree = parse(xml_file)
    root = tree.getroot()
    
    client = pymongo.MongoClient('mongodb://localhost:27017')
    db = client[db_name]
    
    for record in root.findall('record'):
        data = {child.tag: child.text for child in record}
        db.collection('imported').insert_one(data)
    
    client.close()

The defusedxml library provides hardened XML parsers that disable external entity resolution by default. This is the recommended approach for Python applications.

For Java applications using DocumentBuilderFactory:

import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import com.mongodb.client.MongoClient;
import com.mongodb.client.MongoClients;

public class SecureXmlProcessor {
    public void processAndStoreXml(String xmlContent) throws Exception {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        
        // Secure configuration
        factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
        factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
        factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
        factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
        factory.setXIncludeAware(false);
        factory.setExpandEntityReferences(false);
        
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document document = builder.parse(new InputSource(new StringReader(xmlContent)));
        
        // Process document and store in MongoDB
        MongoClient mongoClient = MongoClients.create("mongodb://localhost:27017");
        MongoDatabase database = mongoClient.getDatabase("documents");
        MongoCollection collection = database.getCollection("metadata");
        
        // Convert DOM to MongoDB document and insert
        // ... processing logic ...
        
        mongoClient.close();
    }
}

Additional MongoDB-specific security measures include implementing input validation schemas for XML data before parsing, using content security policies to restrict external resource loading, and implementing rate limiting on XML processing endpoints to mitigate automated XXE attacks.

For applications that must process XML from trusted sources, consider XML schema validation to ensure only expected structures are processed, combined with strict content sanitization before storage in MongoDB.

Frequently Asked Questions

Can XXE attacks extract data directly from MongoDB?
No, XXE attacks don't directly extract data from MongoDB databases. XXE vulnerabilities occur in XML parsers that process XML data before it reaches MongoDB. However, XXE can be used to exfiltrate data from the application server or internal services, which might include MongoDB connection details or query results processed by the application.
How does middleBrick detect XXE vulnerabilities in MongoDB-integrated applications?
middleBrick scans API endpoints that accept XML data by submitting controlled XXE payloads and monitoring responses. It tests for successful entity resolution, SSRF behavior, and unusual response patterns. The scanner doesn't require database credentials or access to your MongoDB instance—it tests the unauthenticated attack surface by submitting XML through your API endpoints and analyzing the responses for XXE indicators.