Xpath Injection in Django with Cockroachdb
Xpath Injection in Django with Cockroachdb — how this specific combination creates or exposes the vulnerability
XPath Injection occurs when untrusted data is concatenated into XPath expressions without proper escaping or parameterization. In Django, this typically arises when developers use the lxml or xml libraries to build dynamic XPath queries, especially when integrating with CockroachDB as the backend. CockroachDB, while PostgreSQL-wire compatible, does not change how XPath expressions are constructed in application code; it only affects how the database stores and returns XML-related data if used via extensions or custom functions.
Consider a Django view that searches XML documents stored as text or in a JSON/XML hybrid column. An unsafe implementation might look like this:
from lxml import etree
from django.http import HttpResponse
def search_documents(request):
user_input = request.GET.get('category', '')
# Unsafe: directly interpolating user input into XPath
xpath_expr = f"//document[category='{user_input}']"
# Assume xml_data is an XML string retrieved from CockroachDB
root = etree.fromstring(xml_data)
results = root.xpath(xpath_expr)
return HttpResponse(f'Found {len(results)} documents')
In this scenario, an attacker can supply a value like ' or '1'='1 to manipulate the XPath logic, potentially retrieving all documents. The risk is not introduced by CockroachDB itself but by the unsafe construction of the XPath string. CockroachDB may store the XML or related metadata, but the injection occurs at the XPath evaluation layer in Python, independent of the database engine. However, if the application uses stored procedures or user-defined functions in CockroachDB that dynamically construct XPath expressions using string concatenation, the database surface can also become vulnerable.
Another vector involves Django management commands or scripts that use XPath over data exported from CockroachDB. If these scripts embed user input into XPath without sanitization, the attack surface extends to backend data processing pipelines. Because XPath operates on the structure of XML, malicious input can traverse or exfiltrate nodes that should be restricted. The key takeaway is that the combination of Django application code and CockroachDB data storage does not inherently create XPath Injection, but the integration pattern — particularly dynamic XPath construction — does.
Cockroachdb-Specific Remediation in Django — concrete code fixes
Remediation focuses on avoiding string interpolation in XPath construction. Use parameterized XPath expressions or filter results programmatically after retrieving nodes. Below is a secure pattern using lxml with predicate filtering instead of embedding user input directly in the path.
from lxml import etree
from django.http import HttpResponse
def search_documents_safe(request):
user_input = request.GET.get('category', '')
# Retrieve XML data from CockroachDB safely (e.g., using Django ORM)
# xml_data = MyModel.objects.get(pk=1).xml_field # Assume stored as text
root = etree.fromstring(xml_data)
# Use a variable and filter in Python instead of injecting into XPath
results = root.xpath('//document[category]', namespaces={"ns": "http://example.com"})
filtered = [r for r in results if r.find('category', namespaces={"ns": "http://example.com"}).text == user_input]
return HttpResponse(f'Found {len(filtered)} documents')
If you must use dynamic predicates, construct the XPath with explicit node testing and escape single quotes by doubling them, though this is less robust than filtering in Python:
safe_input = user_input.replace("'", "''")
xpath_expr = f"//document[category=\"'{safe_input}'\"]"
When interacting with CockroachDB, ensure any XML or JSON extraction is performed using parameterized queries via Django's ORM or database cursors. For example, if storing XML fragments in a STRING column, retrieve the raw data first, then process XPath in the application layer:
import psycopg2
from django.conf import settings
conn = psycopg2.connect(
dbname=settings.DATABASES['default']['NAME'],
user=settings.DATABASES['default']['USER'],
password=settings.DATABASES['default']['PASSWORD'],
host=settings.DATABASES['default']['HOST'],
port=settings.DATABASES['default']['PORT']
)
cur = conn.cursor()
cur.execute("SELECT xml_content FROM documents WHERE id = %s", [doc_id])
xml_data = cur.fetchone()[0]
# Then process with lxml as shown above
For applications using Django's built-in XML handling or third-party packages, validate and sanitize all inputs before they reach XPath evaluation. Regular security scans with tools like middleBrick can detect XPath Injection patterns in your codebase, especially when scanning endpoints that process XML data from CockroachDB. The middleBrick Web Dashboard helps track such findings over time, while the CLI tool allows you to integrate checks directly into development workflows.