HIGH regex dosdjangodynamodb

Regex Dos in Django with Dynamodb

Regex Dos in Django with Dynamodb — how this specific combination creates or exposes the vulnerability

Regular expressions can become a vector for denial-of-service (ReDoS) when patterns are non-anchored and allow exponential backtracking on untrusted input. In Django applications that use Amazon DynamoDB as a backend, combining complex regex validation in Python with DynamoDB query patterns can amplify risk if input is forwarded to or derived from database attributes.

Consider a Django view that retrieves a user record from DynamoDB and then validates a filter parameter with a vulnerable pattern:

import re
from django.http import JsonResponse
import boto3
def search_view(request):
    term = request.GET.get('q', '')
    # Potentially dangerous regex applied to user-controlled input
    if not re.match(r'^(a+)+$', term):
        return JsonResponse({'error': 'invalid'}, status=400)
    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table('Items')
    response = table.scan(FilterExpression=boto3.dynamodb.conditions.Attr('name').contains(term))
    return JsonResponse({'results': response.get('Items', [])})

The regex (a+)+$ is vulnerable to catastrophic backtracking on strings like "a" * 25 + "!". Even though DynamoDB performs the scan, the application layer still evaluates the regex on attacker-controlled input before issuing the request. This keeps the ReDoS risk in the Django process, independent of DynamoDB’s performance. An attacker can tie up worker processes with a single crafted request, leading to elevated latency or service unavailability.

Another scenario arises when a DynamoDB attribute is used directly in regex operations. For example, if your table stores user-supplied patterns or free-text fields, retrieving those values into Django and applying additional regex parsing can compound the issue:

def process_item(request, item_id):
    table = boto3.resource('dynamodb').Table('Config')
    resp = table.get_item(Key={'id': item_id})
    item = resp.get('Item', {})
    pattern = item.get('regex_pattern', '')
    user_input = request.GET.get('data', '')
    # Using a DynamoDB-stored pattern on untrusted input increases blast radius
    if re.fullmatch(pattern, user_input):
        return JsonResponse({'match': True})
    return JsonResponse({'match': False})

Here, the pattern itself is stored in DynamoDB and may have been created without regex safety considerations. If an attacker can influence or poison that stored pattern, they can cause the Django app to exhibit ReDoS when evaluating subsequent requests. Because DynamoDB does not validate regex patterns, the responsibility shifts entirely to the application to ensure patterns are safe and to avoid using untrusted input as regex source.

The interaction between Django’s request/response cycle and DynamoDB’s eventually consistent reads does not mitigate ReDoS; the expensive regex work still occurs synchronously in the Django process. Therefore, the primary attack surface is user input that reaches regex engines, regardless of whether DynamoDB is the persistent store. Mitigations must focus on regex construction, input constraints, and runtime protections to prevent pathological execution paths.

Dynamodb-Specific Remediation in Django — concrete code fixes

To reduce ReDoS risk when using Django with DynamoDB, validate and constrain input before regex evaluation and avoid using untrusted data as regex patterns. Prefer bounded, safe patterns and perform length and structure checks.

Replace vulnerable patterns like (a+)+$ with atomic groups or non-backtracking constructs. For the earlier example, a safe alternative is:

def safe_search_view(request):
    term = request.GET.get('q', '')
    # Safe: bounded repetition, no nested quantifiers
    if not re.fullmatch(r'a{0,100}', term):
        return JsonResponse({'error': 'invalid'}, status=400)
    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table('Items')
    response = table.scan(FilterExpression=boto3.dynamodb.conditions.Attr('name').contains(term))
    return JsonResponse({'results': response.get('Items', [])})

If you must use dynamic patterns sourced from DynamoDB, enforce strict allowlists and avoid user-controlled quantifiers:

import re
def safe_pattern_match(pattern_str, user_input):
    # Allow only lowercase alphanumeric patterns; reject metacharrors that enable nesting
    if not re.fullmatch(r'[a-z0-9|&() ]+', pattern_str):
        raise ValueError('unsafe pattern')
    try:
        compiled = re.compile(pattern_str, re.IGNORECASE)
    except re.error:
        raise ValueError('invalid regex')
    return bool(compiled.fullmatch(user_input))

def use_dynamodb_pattern(request):
    table = boto3.resource('dynamodb').Table('Config')
    resp = table.get_item(Key={'id': 'safe_pattern'})
    pattern = resp.get('Item', {}).get('regex_pattern', '')
    user_data = request.GET.get('data', '')
    if safe_pattern_match(pattern, user_data):
        return JsonResponse({'match': True})
    return JsonResponse({'match': False})

Additionally, limit input length and complexity at the Django view layer to reduce the chance of pathological behavior:

def length_limited_view(request):
    term = request.GET.get('q', '')
    if len(term) > 200:
        return JsonResponse({'error': 'too long'}, status=400)
    # Further validation as needed
    ...

For DynamoDB operations, ensure filters and expressions avoid server-side regex-like behavior and rely on bounded string operations. Use ProjectionExpression to limit returned attributes and keep processing predictable. Combining input validation, pattern allowlists, and bounded repetition in Django provides robust protection while continuing to use DynamoDB for storage and retrieval.

Related CWEs: inputValidation

CWE IDNameSeverity
CWE-20Improper Input Validation HIGH
CWE-22Path Traversal HIGH
CWE-74Injection CRITICAL
CWE-77Command Injection CRITICAL
CWE-78OS Command Injection CRITICAL
CWE-79Cross-site Scripting (XSS) HIGH
CWE-89SQL Injection CRITICAL
CWE-90LDAP Injection HIGH
CWE-91XML Injection HIGH
CWE-94Code Injection CRITICAL

Frequently Asked Questions

Can DynamoDB stored regex patterns be safely used in Django?
Only if the patterns are created from a trusted source and strictly validated before use. Avoid using user-influenced data as regex patterns; enforce an allowlist of allowed characters and reject quantifiers that enable nested repetition.
Does DynamoDB’s scan or query performance affect ReDoS risk in Django?
No. ReDoS risk resides in the regex evaluation in Django’s process. DynamoDB operations may add latency, but the expensive backtracking occurs locally, so input validation and safe pattern design remain essential regardless of backend performance.