HIGH regex dosflaskdynamodb

Regex Dos in Flask with Dynamodb

Regex Dos in Flask with Dynamodb — how this specific combination creates or exposes the vulnerability

Regex DoS (ReDoS) occurs when a regular expression has patterns that can cause catastrophic backtracking on certain inputs. In a Flask application that uses Amazon DynamoDB, ReDoS can be introduced in two primary ways: (1) directly in Python code that validates request parameters before issuing DynamoDB operations, and (2) indirectly via attacker-controlled data stored in DynamoDB that is later used in regex operations. The combination of Flask routing and DynamoDB access patterns can amplify the impact because input validation is often applied to path or query parameters that map to DynamoDB keys or scan filters.

Consider a Flask route that uses a regex to validate a user-supplied user_id before querying DynamoDB:

import re
from flask import Flask, request
import boto3

app = Flask(__name__)
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table('Users')

# Vulnerable regex: exponential time on certain patterns
USER_ID_REGEX = re.compile(r'^(a+)+$')

@app.route('/user/')
def get_user(user_id):
    if not USER_ID_REGEX.match(user_id):
        return {'error': 'invalid user_id'}, 400
    response = table.get_item(Key={'user_id': user_id})
    return response.get('Item', {})

The regex (a+)+$ is vulnerable to ReDoS because both quantifiers are greedy and can overlap, causing exponential backtracking on inputs like aaaa...aaaaX. An attacker can craft a long string that triggers high CPU usage in the Flask process during validation, leading to a denial of service. Because DynamoDB calls are typically made only after regex validation, the expensive operation occurs before any database interaction, but it still degrades service availability.

Another scenario involves using untrusted data from DynamoDB in regex operations. If your application stores user-supplied patterns or text in DynamoDB and later applies them with re.match or similar, an attacker who can inject a malicious pattern into DynamoDB can affect all users who trigger that regex. For example, a stored pattern like (a+)+$ applied to incoming data can cause ReDoS across multiple requests, turning a single compromised item into a persistent DoS vector.

Flask’s routing can also introduce risks if path converters are combined with custom regex. While Flask’s built-in converters are generally safe, combining them with manual regex validation on the same parameter increases complexity and the chance of an overlapping, inefficient pattern. Additionally, scan filters passed to DynamoDB that include regex-like behavior (e.g., using BEGINS_WITH or CONTAINS for simple prefix/substring checks) are safe, but if you implement client-side filtering using regex on large result sets, you risk high CPU usage on the client side.

To determine if your application is at risk, middleBrick’s LLM/AI Security checks include Active prompt injection testing and System prompt leakage detection, but for application-level DoS, you should audit regex patterns that touch external data sources. Look for quantifiers with overlapping behavior, especially patterns containing nested quantifiers like (a+)+, (a|aa)+, or (a*)*.

Dynamodb-Specific Remediation in Flask — concrete code fixes

Remediation focuses on replacing vulnerable regex patterns with safe alternatives and ensuring DynamoDB interactions remain efficient and safe. Below are concrete, working examples for Flask applications.

1. Replace vulnerable regex with safe validation

Instead of using overlapping quantifiers, use a regex that cannot cause backtracking. For validating a format like user_123, use a non-overlapping pattern:

import re
from flask import Flask, request
import boto3

app = Flask(__name__)
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table('Users')

# Safe regex: no nested quantifiers
USER_ID_REGEX = re.compile(r'^user_\d{1,10}$')

@app.route('/user/')
def get_user(user_id):
    if not USER_ID_REGEX.match(user_id):
        return {'error': 'invalid user_id'}, 400
    response = table.get_item(Key={'user_id': user_id})
    return response.get('Item', {})

This pattern avoids quantifier overlap and bounds the length of the numeric portion, preventing ReDoS while still enforcing a basic format.

2. Use simple string operations when possible

For prefix checks, prefer built-in string methods over regex:

@app.route('/item/')
def get_item(item_id):
    if not item_id.startswith('item_'):
        return {'error': 'invalid item_id'}, 400
    response = table.get_item(Key={'item_id': item_id})
    return response.get('Item', {})

3. Avoid regex on untrusted data from DynamoDB

If you must use patterns stored in DynamoDB, sandbox or pre-compile them and avoid user-controlled quantifiers. A safer approach is to map stored patterns to a restricted set of allowed formats:

ALLOWED_FORMATS = {
    'email': r'^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$',
    'slug': r'^[A-Za-z0-9_-]+$',
}

pattern_key = request.args.get('format', 'slug')
if pattern_key not in ALLOWED_FORMATS:
    return {'error': 'unsupported format'}, 400

regex = re.compile(ALLOWED_FORMATS[pattern_key])
if not regex.match(user_input):
    return {'error': 'invalid input'}, 400

response = table.scan(
    FilterExpression=boto3.attr('category').eq('widgets')
)
# Process items safely without applying user patterns to raw data

4. Use middleBrick for ongoing validation guidance

In production, integrate middleBrick’s CLI to scan your endpoints and detect ReDoS risks in regex usage:

# Scan your Flask API endpoint from the terminal
middlebrick scan https://api.example.com/openapi.json

The scan results include prioritized findings with severity and remediation guidance, helping you identify regex patterns that may lead to DoS. For continuous protection, use the Pro plan’s GitHub Action to fail builds if risk scores exceed your threshold, or the MCP Server to validate APIs directly from your IDE.

Related CWEs: inputValidation

CWE IDNameSeverity
CWE-20Improper Input Validation HIGH
CWE-22Path Traversal HIGH
CWE-74Injection CRITICAL
CWE-77Command Injection CRITICAL
CWE-78OS Command Injection CRITICAL
CWE-79Cross-site Scripting (XSS) HIGH
CWE-89SQL Injection CRITICAL
CWE-90LDAP Injection HIGH
CWE-91XML Injection HIGH
CWE-94Code Injection CRITICAL

Frequently Asked Questions

Can ReDoS in Flask be triggered through DynamoDB scan results alone?
Not directly. ReDoS requires a regex operation on attacker-controlled input. If scan results from DynamoDB are used to construct regex patterns dynamically without validation, that could introduce risk, but the vulnerability is in the regex usage, not DynamoDB itself.
Does middleBrick fix regex DoS vulnerabilities automatically?
middleBrick detects and reports potential ReDoS findings with remediation guidance, but it does not automatically fix code. Developers should apply the suggested pattern changes, such as replacing nested quantifiers with bounded, non-overlapping expressions.