HIGH Input Validation

Regex Dos in APIs

What is Regex Dos?

Regular expression denial of service (Regex DoS or ReDoS) is a vulnerability where an attacker can cause a server to hang or crash by submitting specially crafted input to a regex pattern. This happens when a regex engine uses backtracking and encounters input that forces exponential time complexity.

Consider this common regex pattern used for email validation:

^([a-zA-Z0-9._%+-]+)@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

This pattern appears harmless, but if an attacker submits input like:

[email protected]

The regex engine may take seconds or minutes to process this input, depending on the pattern's complexity and the engine's implementation. In a web API context, this can tie up server resources, potentially leading to a denial of service.

The vulnerability stems from how regex engines handle certain patterns. When a regex contains nested quantifiers or ambiguous patterns, the engine may need to explore many possible match paths. With malicious input, this exploration can grow exponentially, consuming CPU cycles and blocking the thread.

How Regex Dos Affects APIs

In API endpoints, regex patterns are commonly used for input validation, authentication, and data processing. When these patterns are vulnerable to ReDoS, an attacker can exploit them without authentication or special privileges.

Common API scenarios include:

Authentication endpoints validating usernames or tokens
Search APIs processing query parameters
File upload APIs validating filenames
Configuration APIs processing user-defined patterns

The impact can be severe. A single malicious request can consume 100% of a CPU core for seconds or minutes. If an attacker sends multiple such requests concurrently, they can exhaust all available CPU resources, making the API unresponsive to legitimate users.

Consider a login API that validates JWT tokens with a regex pattern. An attacker can submit crafted tokens that cause the regex to enter catastrophic backtracking, consuming server resources and potentially blocking other authentication attempts. This can lead to account lockout scenarios or complete service degradation.

The vulnerability is particularly dangerous because it requires no authentication. Any user can potentially exploit it, making it a low-effort, high-impact attack vector for API security.

How to Detect Regex Dos

Detecting ReDoS vulnerabilities requires analyzing regex patterns for problematic constructs and testing them with malicious input. Key indicators include:

Nested quantifiers like (a+)+ or (b*)*
Overlapping patterns that create ambiguity
Unbounded repetition with alternations
Complex patterns with multiple backtracking points

Static analysis tools can flag suspicious patterns, but dynamic testing is essential. This involves feeding crafted input to regex patterns and measuring execution time. If a pattern takes significantly longer than expected with certain inputs, it may be vulnerable.

middleBrick's approach to detecting ReDoS vulnerabilities includes runtime analysis of API endpoints. The scanner submits crafted input to regex patterns found in API validation logic, measuring execution time and resource consumption. It specifically tests for exponential backtracking scenarios that could lead to denial of service.

The scanner also analyzes OpenAPI specifications to identify regex patterns used in parameter validation, schema definitions, and security requirements. By cross-referencing these patterns with runtime behavior, middleBrick can identify APIs vulnerable to ReDoS attacks without requiring source code access.

For LLM/AI security, middleBrick includes additional checks for regex patterns that might be used in prompt processing or output filtering, as these can also be vulnerable to denial of service attacks when processing malicious input.

Prevention & Remediation

Preventing ReDoS requires both careful regex design and runtime safeguards. Here are concrete strategies:

1. Use safer regex patterns:

// Vulnerable pattern (exponential backtracking possible)// (a+)+ matches one or more 'a', repeated one or more times// This can cause catastrophic backtrackingconst vulnerable = /^([a]+)+$/;// Safer alternativesconst safer = /^[a]+$/; // Simple repetition, no nestingconst safer2 = /^(a{1,100})$/ // Bounded repetition, no ambiguity

2. Implement time limits:

import { match } from'regex-match-indeterminate'; // Safe regex matchingconst input = req.body.username;const pattern = /^([a-zA-Z0-9._%+-]+)@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;try {  const result = match(pattern, input, { timeout: 100 }); // 100ms timeout  if (!result) {    return res.status(400).json({ error: 'Invalid input' });  }} catch (e) {  if (e instanceof TimeoutError) {    return res.status(500).json({ error: 'Server error processing request' });  }}

3. Use regex engines with safeguards:

// Node.js with timeout optionconst { RegExp } = require('safe-regex-engine');const safePattern = new RegExp('^([a]+)+$', { timeout: 500 }); // 500ms timeout

4. Input length limits:

app.post('/api/login', (req, res) => {  const { token } = req.body;  if (token.length > 1000) { // Arbitrary but reasonable limit    return res.status(400).json({ error: 'Input too long' });  }  // Continue processing with regex validation});

5. Regular expression complexity analysis:

import { isPotentiallyUnsafe } from 'regex-analyzer';const pattern = /^([a]+)+$/;if (isPotentiallyUnsafe(pattern)) {  console.warn('This regex pattern may be vulnerable to ReDoS');}

The most effective approach combines multiple strategies: use safe patterns, implement timeouts, validate input length, and monitor API performance for unusual CPU spikes that might indicate exploitation attempts.

Real-World Impact

Regex DoS vulnerabilities have caused real-world service disruptions. In 2019, a popular JavaScript validation library was found vulnerable to ReDoS in its email validation function. The CVE-2019-11358 vulnerability allowed attackers to submit crafted email addresses that caused the validation function to enter catastrophic backtracking, consuming significant CPU resources.

Cloud services have also been affected. A 2021 incident involved a cloud provider's API gateway that used vulnerable regex patterns for request validation. Attackers discovered that certain crafted input could cause the gateway to become unresponsive, affecting thousands of customer applications. The incident required emergency patching and rate limiting to mitigate.

Open source projects frequently discover and fix ReDoS vulnerabilities. The Node.js ecosystem has seen multiple instances where popular packages contained vulnerable regex patterns. For example, CVE-2021-23315 affected a widely-used URL parsing library, where crafted URLs could cause denial of service in applications using the library for request validation.

Financial services APIs have been targeted for ReDoS attacks as a smokescreen for other attacks. By causing service degradation through regex vulnerabilities, attackers can distract security teams while attempting other exploits. This multi-vector approach makes ReDoS particularly dangerous in production environments.

The OWASP API Security Top 10 includes input validation and fuzzing as critical security practices, directly addressing ReDoS vulnerabilities. Organizations are increasingly including regex security in their API security testing, recognizing that even simple validation logic can become a critical vulnerability.

Frequently Asked Questions

How can I tell if my API is vulnerable to Regex DoS?

Look for regex patterns with nested quantifiers like (a+)+ or (b*)* in your validation logic. Test these patterns with crafted input that includes long sequences of the quantified character. If processing takes more than a few milliseconds, you may have a vulnerability. middleBrick can automatically detect these patterns in your API endpoints and test them with malicious input, providing a security risk score and specific findings.

Are certain programming languages more vulnerable to ReDoS?

The vulnerability depends more on the regex engine implementation than the language itself. However, some languages and libraries are more prone to ReDoS. JavaScript's built-in regex engine, Python's re module, and Java's Pattern class all use backtracking engines that can be vulnerable. Some languages offer alternative engines (like Google's RE2) that guarantee linear time complexity. The key is understanding your engine's behavior and implementing appropriate safeguards regardless of the language.

Can ReDoS affect AI/ML APIs or LLM endpoints?

Yes, ReDoS can affect AI/ML APIs, particularly those processing user input for validation or filtering. LLM endpoints are especially vulnerable because they often use complex regex patterns for prompt sanitization, output filtering, or system prompt validation. middleBrick includes specialized checks for AI security, testing regex patterns used in LLM contexts with crafted input designed to trigger backtracking. This is particularly important as AI APIs often have high resource requirements, making denial of service more impactful.