Regex Dos in Flask with Bearer Tokens
Regex Dos in Flask with Bearer Tokens — how this specific combination creates or exposes the vulnerability
A Regex Denial-of-Service (ReDoS) occurs when a regular expression has patterns that can cause catastrophic backtracking on certain inputs. In Flask APIs that use Bearer tokens, this typically arises in two places: (1) route definitions that validate tokens with complex regex-based path or header rules, and (2) custom token-parsing logic that uses non-anchored or overly permissive patterns on user-supplied Authorization header values.
Consider a Flask route that attempts to validate a Bearer token inline using a regex pattern with nested quantifiers:
@app.route('/data/<string:token>')
def get_data(token):
# Risky: nested quantifiers in the pattern can cause exponential backtracking
import re
pattern = r'(a+)+$'
if re.fullmatch(pattern, token):
return {'ok': True}
return {'error': 'invalid'}, 401
If an attacker sends a crafted token like aaaa...x (many repeated characters), the regex engine can enter exponential backtracking, consuming high CPU and causing service delay or outage. Even if token validation is moved to middleware, similar issues arise when patterns are applied to the Authorization header value directly.
Another common scenario involves using regex to extract or validate tokens from headers in a way that compounds risk across multiple checks. For example:
import re
auth_header = request.headers.get('Authorization', '')
# Fragile: multiple passes over the same header with overlapping patterns
m1 = re.match(r'^Bearer\s+(\S+)$', auth_header)
m2 = re.match(r'^([A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+\.?[A-Za-z0-9\-_]*)', m1.group(1) if m1 else '')
Each additional regex pass increases the attack surface. If any pattern contains unanchored quantifiers or ambiguous groupings, an attacker can send long, specially crafted strings that trigger backtracking across passes, effectively chaining multiple expensive operations. This is especially dangerous in unauthenticated scanning contexts where an endpoint expects no authentication but regex-based guards still operate on untrusted input.
Because middleBrick scans the unauthenticated attack surface and tests input validation among its 12 parallel checks, it can surface ReDoS risks in the regex handling of Bearer tokens. Findings typically highlight patterns with nested quantifiers, missing anchors, or repeated subexpressions that can lead to disproportionate CPU consumption on malicious inputs.
Bearer Tokens-Specific Remediation in Flask — concrete code fixes
To mitigate Regex Dos in Flask when working with Bearer tokens, simplify and anchor patterns, avoid nested quantifiers, and move validation to a single, efficient check. Prefer standard token handling libraries or fixed string checks where feasible, and ensure patterns are linear in time complexity.
Remediation example 1 — use a simple prefix check instead of a complex regex:
from flask import request, jsonify
def get_data():
auth_header = request.headers.get('Authorization', '')
# Safe: constant-time prefix check, no regex backtracking
if not auth_header.startswith('Bearer '):
return jsonify({'error': 'invalid_auth'}), 401
token = auth_header.split(' ', 1)[1]
if not token:
return jsonify({'error': 'invalid_token'}), 401
# proceed with token validation (e.g., verify via auth service)
return jsonify({'ok': True})
Remediation example 2 — if regex is required, keep it simple, anchored, and non-backtracking:
import re
from flask import request, jsonify
def get_data():
auth_header = request.headers.get('Authorization', '')
# Safe: anchored, no nested quantifiers, single pass
pattern = re.compile(r'^Bearer [A-Za-z0-9\-._~+/=]+$')
if not pattern.match(auth_header):
return jsonify({'error': 'invalid_auth'}), 401
token = auth_header.split(' ')[1]
# proceed with token validation
return jsonify({'ok}) # placeholder
Remediation example 3 — use a dedicated token validation library and avoid ad-hoc regex for JWTs:
from flask import request, jsonify
import jwt # PyJWT
def get_data():
auth_header = request.headers.get('Authorization', '')
if not auth_header or not auth_header.startswith('Bearer '):
return jsonify({'error': 'invalid_auth'}), 401
token = auth_header.split(' ', 1)[1]
try:
# Decode and validate with a library; avoid custom regex on JWT structure
decoded = jwt.decode(token, options={"verify_signature": False})
# further authorization checks
return jsonify({'ok': True, 'user': decoded})
except Exception:
return jsonify({'error': 'invalid_token'}), 401
In all cases, ensure that regex patterns are linear, anchored, and tested against worst-case inputs. For production, prefer standard libraries for token parsing and validation. middleBrick can help identify risky regex patterns in your OpenAPI spec and runtime checks, and its CLI and dashboard let you track these findings across scans. With the Pro plan, continuous monitoring can alert you if new risky patterns appear in future commits, and the GitHub Action can fail builds when security scores drop below your chosen threshold.
Related CWEs: inputValidation
| CWE ID | Name | Severity |
|---|---|---|
| CWE-20 | Improper Input Validation | HIGH |
| CWE-22 | Path Traversal | HIGH |
| CWE-74 | Injection | CRITICAL |
| CWE-77 | Command Injection | CRITICAL |
| CWE-78 | OS Command Injection | CRITICAL |
| CWE-79 | Cross-site Scripting (XSS) | HIGH |
| CWE-89 | SQL Injection | CRITICAL |
| CWE-90 | LDAP Injection | HIGH |
| CWE-91 | XML Injection | HIGH |
| CWE-94 | Code Injection | CRITICAL |