Xml External Entities in Rails with Firestore
Xml External Entities in Rails with Firestore — how this specific combination creates or exposes the vulnerability
XML External Entity (XXE) injection occurs when an application processes XML input that references external entities, allowing an attacker to force the parser to read local files, make internal network calls, or reflect sensitive data. In a Ruby on Rails application that integrates with Google Cloud Firestore, the risk arises when XML data—often from user-supplied payloads such as uploaded configuration files, batch imports, or SOAP-based APIs—is deserialized without disabling external entity resolution.
Rails does not parse XML by default in new applications, but when developers use gems such as rexml, nokogiri, or builder to handle XML, and the XML parser is configured to resolve external entities, an XXE vector exists. If the processed data is later used to construct or query Firestore documents—such as mapping XML fields to document fields or using XML-derived identifiers to look up records—an attacker can exploit this to probe internal services, retrieve credentials from the filesystem, or interact with metadata endpoints that lead to broader cloud exposure.
The Firestore client libraries for Ruby do not parse XML directly; however, an XXE vulnerability is introduced at the application layer when Rails processes XML before passing extracted values to Firestore operations. For example, an attacker might submit an XML payload that defines a parameter entity pointing to /etc/passwd or to the instance metadata service at http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default. If the Rails code uses the extracted content to build a document ID or a query filter, the malicious payload may be reflected in responses or logged in ways that disclose sensitive information.
Because Firestore operations often include structured data with nested fields, developers may inadvertently map XML elements directly to document properties. This mapping becomes dangerous when XML external entities are resolved, as sensitive server-side data can be exfiltrated through error messages or side channels. In environments where the Rails app runs on Google Cloud, the combination of unrestricted XML parsing and Firestore access increases the potential impact, as internal metadata services are reachable from within the same network.
An example scenario involves an import endpoint that accepts an XML file describing a resource and writes it to Firestore. If the XML parser resolves DOCTYPE declarations and external entities, an attacker can cause the parser to read arbitrary files or trigger network requests. The extracted data may not directly modify Firestore permissions, but it can inform further attacks, such as crafting authenticated requests or exploiting misconfigured IAM roles. Therefore, mitigating XXE in this context requires disabling external entity resolution during XML parsing and validating that any data sent to Firestore originates from a trusted, non-XML-derived source.
Firestore-Specific Remediation in Rails — concrete code fixes
To prevent XXE when Rails applications interact with Firestore, ensure XML parsing is configured to ignore external entities. Use safe parsing libraries or explicitly disable DTD and external entity resolution. Below are concrete examples using Nokogiri and REXML, followed by secure Firestore document operations.
Secure Nokogiri parsing
When parsing XML with Nokogiri in a Rails controller or service object, avoid the default load and instead use a non-network-resolving parser.
require 'nokogiri'
# Safe parsing: disable external entities
xml_input = params[:xml_data]
document = Nokogiri::XML(xml_input) do |config|
config.noblanks
config.strict
config.options = Nokogiri::XML::ParseOptions::NONET | Nokogiri::XML::ParseOptions::DTDLOAD
end
# Extract only expected fields; avoid referencing external entities
safe_title = document.at_css('title')&.text
# Validate and sanitize before using in Firestore operations
title = safe_title.to_s.strip unless safe_title.nil?
# Firestore write using the google-cloud-firestore gem
require "google/cloud/firestore"
firestore = Google::Cloud::Firestore.new
project_id = "your-project-id"
database_id = "(default)"
firestore_doc_ref = firestore.doc("projects/#{project_id}/databases/#{database_id}/collection/#{title}")
firestore_doc_ref.set({
title: title,
created_at: Time.now.utc.iso8601
})
Secure REXML parsing
REXML can be used safely by avoiding entity expansion and ensuring the parser does not fetch external resources.
require 'rexml/document'
include REXML
# Disable external entity resolution
parser = Parsers::StreamParser.new
parser.entity_resolver = nil
xml_input = params[:xml_data]
document = Document.new(xml_input, parser)
# Extract data safely
root = document.root
elements = root.elements.each('record/field')
safe_data = elements.map { |e| e.text }.join('-')
# Firestore batch write
require "google/cloud/firestore"
firestore = Google::Cloud::Firestore.new
batch = firestore.batch
firestore.collection("imports").doc(SecureRandom.uuid).set({
data: safe_data,
processed_at: Time.now.utc.iso8601
})
batch.commit
Validation and allowlisting
In addition to safe parsing, validate extracted values against an allowlist of expected patterns. For Firestore document IDs, restrict characters to alphanumerics, hyphens, and underscores to prevent injection into document paths.
def safe_document_id?(value)
value.match?(/
\A[a-zA-Z0-9_-]{1,100}\z/
)
end
doc_id = params[:derived_id]
if safe_document_id?(doc_id)
firestore.collection("user_data").doc(doc_id).set({ xml_imported: true })
else
raise ArgumentError, "Invalid document identifier"
end
Middleware and parameter sanitization
Consider adding a request-level middleware that rejects XML payloads if they contain DOCTYPE declarations or entity definitions, further reducing the attack surface before data reaches Firestore operations.
class XmlEntityFilter
def initialize(app)
@app = app
end
def call(env)
request = Rack::Request.new(env)
if request.post? && request.content_type.to_s.include?('xml')
body = request.body.read
if body.include?(' 'application/json' }, [{ error: 'External entities are not allowed' }.to_json]]
end
request.body = StringIO.new(body)
end
@app.call(env)
end
end
# config/application.rb
config.middleware.use XmlEntityFilter