HIGH Input Validation

Xpath Injection in APIs

What is Xpath Injection?

XPath injection is a code injection vulnerability that occurs when an attacker can manipulate the XPath query used to retrieve data from an XML document. Similar to SQL injection but targeting XML data stores, XPath injection allows attackers to bypass authentication, access unauthorized data, or manipulate application logic by injecting malicious XPath syntax into user inputs.

The vulnerability arises when user-supplied data is concatenated directly into XPath expressions without proper sanitization. Since XPath queries can contain quotes, parentheses, and operators, attackers can craft inputs that alter the query's logic. For example, if an application constructs an XPath query like:

doc.evaluate('/users/user[username="' + inputUsername + '"]', doc)

An attacker could submit a username like:

admin" or 1=1 or "a"="a

This would create an XPath query that always evaluates to true, potentially bypassing authentication or returning all user records.

How Xpath Injection Affects APIs

In API contexts, XPath injection typically affects endpoints that query XML data sources or use XML for data exchange. Common vulnerable scenarios include authentication endpoints that validate XML-based credentials, search APIs that query XML documents, and configuration APIs that parse XML files.

Attackers can exploit XPath injection to achieve several malicious outcomes:

Data extraction: Retrieve sensitive information from XML documents, including passwords, personal data, or configuration details
Authentication bypass: Log in as any user by manipulating authentication queries
Denial of service: Craft queries that consume excessive resources or cause XML parsing errors
Data manipulation: Modify or delete XML data through crafted queries

For instance, consider an API endpoint that authenticates users by querying an XML user store:

GET /api/authenticate?username=admin&password=wrong

If the backend constructs an XPath query like:

/users/user[username="admin" and password="wrong"]

An attacker could submit: password=wrong" or "a"="a to bypass authentication entirely.

How to Detect Xpath Injection

Detecting XPath injection requires examining how user inputs are incorporated into XPath queries. Key indicators include:

Direct string concatenation of user inputs into XPath expressions
Lack of input validation or sanitization for special characters like quotes, parentheses, and operators
Dynamic XPath construction based on user parameters
XML data sources used for authentication or authorization

middleBrick scans for XPath injection vulnerabilities by testing API endpoints with malicious XPath payloads. The scanner attempts to inject XPath syntax into all string parameters and examines responses for signs of successful injection, such as:

Unexpected data exposure
Authentication bypass
Application errors containing XPath syntax
Timing differences in query responses

The scanner tests common XPath injection patterns including boolean logic manipulation, comment injection, and union-style attacks. For APIs that accept XML input, middleBrick also tests for XML External Entity (XXE) vulnerabilities, which often coexist with XPath injection issues.

Prevention & Remediation

Preventing XPath injection requires a defense-in-depth approach:

Use parameterized XPath queries: Modern XPath libraries support parameterized queries that separate data from query logic, similar to prepared statements in SQL.
Input validation and sanitization: Validate all user inputs against expected patterns and sanitize special characters. For XPath contexts, escape quotes and other special characters.
Least privilege principle: Restrict XML document access permissions to minimize potential data exposure.
Avoid XML for sensitive data: Consider using more secure data storage formats for authentication and authorization data.

Here's an example of secure XPath query construction using parameterized queries:

const xpath = require('xpath');
const dom = require('xmldom').DOMParser;

// Secure: Using parameterized queries
const query = xpath.evaluate(
  '/users/user[username=$username and password=$password]',
  doc,
  null,
  xpath.XPathResult.ANY_TYPE,
  {
    username: inputUsername,
    password: inputPassword
  }
);

If parameterized queries aren't available, implement strict input validation:

function validateUsername(username) {
  // Allow only alphanumeric characters and underscores
  const regex = /^[a-zA-Z0-9_]+$/;
  return regex.test(username);
}

For APIs that must handle XML input, also implement XML External Entity (XXE) protection and consider using XML schema validation to restrict input structure.

Real-World Impact

XPath injection vulnerabilities have been documented in various applications and APIs. A notable example is CVE-2017-7651, which affected certain versions of Apache Tomcat where XPath injection in the JMX Proxy servlet could allow remote code execution. While not exclusively an API vulnerability, it demonstrates how XPath injection can lead to severe consequences.

In 2015, a vulnerability in the Axis2 administration console allowed attackers to perform XPath injection attacks to retrieve sensitive information from the Axis2 configuration files. This affected SOAP-based web services that used Axis2 for XML processing.

More recently, several open-source authentication libraries that used XML-based credential stores have been found vulnerable to XPath injection, allowing attackers to bypass authentication mechanisms entirely. These vulnerabilities often score high on CVSS (Common Vulnerability Scoring System) due to their potential for data exposure and authentication bypass.

The financial impact of XPath injection can be significant. Beyond immediate data theft, successful exploitation can lead to regulatory penalties under frameworks like GDPR or PCI-DSS if personal or financial data is exposed. Additionally, the reputational damage from a security breach can far exceed the technical remediation costs.

Frequently Asked Questions

How is XPath injection different from SQL injection?

Both are code injection vulnerabilities, but they target different data stores. SQL injection affects relational databases using SQL queries, while XPath injection targets XML documents using XPath queries. The syntax and exploitation techniques differ, but the underlying principle—injecting malicious code through untrusted inputs—is the same. Both can lead to data exposure, authentication bypass, and denial of service.

Can XPath injection occur in JSON-based APIs?

Pure JSON APIs are not directly vulnerable to XPath injection since they don't use XPath queries. However, if a JSON API processes XML data internally, transforms JSON to XML, or uses XML for configuration/authentication, XPath injection vulnerabilities could exist in those XML processing components. Additionally, if an API accepts XML input that gets converted to JSON, the XML parsing stage could be vulnerable to related XML attacks like XXE.

How quickly can middleBrick detect XPath injection vulnerabilities?

middleBrick can detect XPath injection vulnerabilities in 5-15 seconds per API endpoint. The scanner automatically tests all string parameters with malicious XPath payloads and analyzes responses for injection indicators. For comprehensive testing, middleBrick evaluates multiple injection patterns and reports findings with severity levels and remediation guidance, helping developers quickly identify and fix vulnerabilities before deployment.