Regex Dos in Django with Bearer Tokens
Regex Dos in Django with Bearer Tokens — how this specific combination creates or exposes the vulnerability
A Regular Expression Denial of Service (Regex DoS) occurs when an attacker provides input that causes a regex engine to exhibit catastrophic backtracking, consuming excessive CPU time. In Django, this risk is heightened when Bearer Tokens are handled in authentication paths that rely on Python’s re module without safeguards. Common patterns include using unanchored or overly permissive regexes to validate, parse, or capture tokens, especially when token formats allow long strings with repetitive or ambiguous subpatterns.
Consider a scenario where a developer writes a view that extracts a Bearer Token from the Authorization header using a broad regex like ^Bearer (.*)$. If the token is passed directly into further regex-based operations (e.g., scope parsing, signature validation, or custom claim extraction), and those regexes contain nested quantifiers or optional groups with overlapping matches, an attacker can craft a malicious token that triggers exponential backtracking. For example, a token such as a{1000} combined with a pattern like (a+)+$ can cause the regex engine to explore an enormous number of paths. In Django, this often manifests as a slow or unresponsive authentication endpoint, effectively achieving a denial of service without needing to bypass authentication or read any data.
Because middleBrick tests unauthenticated attack surfaces, it can detect such risky regex patterns during API scans, especially when OpenAPI specs describe authentication schemes using Bearer tokens. The scanner identifies endpoints where Authorization header validation appears permissive and flags related regex-based input validation checks as potential security concerns. This is distinct from traditional injection; the threat is resource exhaustion rather than data access, but the impact on availability is severe.
Another subtlety arises when Django REST framework or custom decorators use regex to validate token-related claims or scopes. For instance, splitting a token by . and then applying additional regex checks on each part can reintroduce backtracking risks if the token payload contains long repetitive segments. Even if the initial extraction uses a safe pattern, downstream processing may reintroduce the vulnerability. MiddleBrick’s checks for Input Validation and Property Authorization include analysis of regex usage around token handling, helping to surface these hidden paths.
Bearer Tokens-Specific Remediation in Django — concrete code fixes
Remediation focuses on avoiding regex for token validation where possible and ensuring any regex used is safe and bounded. Prefer structured parsing and strict string operations over pattern matching for Bearer Tokens. Below are concrete, realistic examples.
Safe Bearer Token Extraction Without Regex
Use simple string operations to extract the token. This avoids backtracking entirely and is clearer in intent.
import re
from django.http import HttpRequest, JsonResponse
from django.views import View
class SafeTokenView(View):
def dispatch(self, request: HttpRequest, *args, **kwargs):
auth = request.META.get('HTTP_AUTHORIZATION', '')
# Simple, constant-time check: avoid regex
if not auth.startswith('Bearer '):
return JsonResponse({'error': 'Unauthorized'}, status=401)
token = auth[7:] # len('Bearer ') == 7
if not token:
return JsonResponse({'error': 'Token missing'}, status=401)
# Further validation: length, allowed characters, lookup, etc.
if not re.fullmatch(r'[A-Za-z0-9\-_=]+\.[A-Za-z0-9\-_=]+\.?[A-Za-z0-9\-_=]*', token):
return JsonResponse({'error': 'Invalid token format'}, status=401)
request.token = token
return super().dispatch(request, *args, **kwargs)
Using Compiled Regex with Anchors and Atomic Groups
If regex is necessary (for example, to validate JWT structure), use a compiled pattern with explicit start/end anchors and avoid nested quantifiers. Pre-compile the pattern at module load to reduce overhead.
import re
from django.conf import settings
# Safe: anchored, no nested quantifiers, limited character classes
SAFE_JWT_REGEX = re.compile(
r'^[A-Za-z0-9\-_=]+\.[A-Za-z0-9\-_=]+\.[A-Za-z0-9\-_=]+$',
re.ASCII
)
def validate_token_format(token: str) -> bool:
"""Return True only if token matches a safe JWT-like structure."""
return bool(SAFE_JWT_REGEX.match(token))
# Usage in a view or middleware
from django.http import HttpRequest, JsonResponse
class TokenValidationView:
def __call__(self, request: HttpRequest):
auth = request.META.get('HTTP_AUTHORIZATION', '')
if not auth.startswith('Bearer '):
return JsonResponse({'error': 'Unauthorized'}, status=401)
token = auth[7:].strip()
if not token or not validate_token_format(token):
return JsonResponse({'error': 'Invalid token'}, status=401)
request.token = token
# proceed
Avoiding Catastrophic Backtracking in Scopes
If token payloads include scope claims validated via regex, ensure patterns are non-overlapping and bounded. For example, prefer exact matches or whitelist checks over open-ended repetition.
from typing import Set
VALID_SCOPES: Set[str] = {'read', 'write', 'admin'}
def validate_scope(scope_str: str) -> bool:
"""Avoid regex for scope validation; use a set lookup."""
scopes = scope_str.split()
return all(s in VALID_SCOPES for s in scopes)
Integrating with middleBrick and CI/CD
With the middleBrick CLI, you can scan your Django API endpoints to surface risky regex patterns in the Authentication and Input Validation checks. In your CI/CD pipeline, the GitHub Action can fail builds if the security score drops below your chosen threshold, preventing deployments that rely on unsafe token handling. The MCP Server lets you trigger scans directly from AI coding assistants within your IDE, providing rapid feedback during development.
Related CWEs: inputValidation
| CWE ID | Name | Severity |
|---|---|---|
| CWE-20 | Improper Input Validation | HIGH |
| CWE-22 | Path Traversal | HIGH |
| CWE-74 | Injection | CRITICAL |
| CWE-77 | Command Injection | CRITICAL |
| CWE-78 | OS Command Injection | CRITICAL |
| CWE-79 | Cross-site Scripting (XSS) | HIGH |
| CWE-89 | SQL Injection | CRITICAL |
| CWE-90 | LDAP Injection | HIGH |
| CWE-91 | XML Injection | HIGH |
| CWE-94 | Code Injection | CRITICAL |