HIGH regex dosflaskbasic auth

Regex Dos in Flask with Basic Auth

Regex Dos in Flask with Basic Auth — how this specific combination creates or exposes the vulnerability

Regular expression denial of service (Regex Dos) occurs when a regex pattern exhibits catastrophic backtracking on certain inputs, causing extreme CPU consumption. In Flask applications that rely on Basic Authentication, this risk is introduced when developers use complex or non-anchored regular expressions to validate usernames, passwords, or extracted authorization headers. Because Basic Auth credentials are typically parsed from the Authorization header with a regex such as Basic\s+(\S+), an attacker can supply carefully crafted base64-like strings that cause the regex engine to backtrack excessively, particularly when quantifiers are nested or when patterns attempt to loosely validate character classes without boundaries.

When combined with runtime frameworks like Flask, the issue is exacerbated if the regex is applied per request in a synchronous path, such as during authentication checks before entering a protected view. For example, a pattern that tries to match optional whitespace, alphanumeric segments, and optional delimiters without atomic grouping or possessive quantifiers can hang on long, repetitive strings. Because the validation runs in the same process as the web server, this consumes disproportionate CPU time for a single request, making the endpoint susceptible to denial of service without requiring authentication at all. This represents an unauthenticated attack surface that a scanner like middleBrick tests under its Regex DoS and Input Validation checks, flagging patterns that lack strict boundaries or rely on greedy quantifiers in security-sensitive contexts.

In practice, a vulnerable Flask route might look like the following naive implementation, where a custom decorator uses a permissive regex to validate credentials before further processing:

import re
from flask import Flask, request, Response

app = Flask(__name__)

# Vulnerable: permissive regex with nested quantifiers
CREDENTIAL_PATTERN = re.compile(r'^([A-Za-z0-9._%+\-]+:)?([A-Za-z0-9._%+\-\s=]+)?@.*$')

def validate_auth(view_func):
    def wrapper(*args, **kwargs):
        auth = request.headers.get('Authorization', '')
        if not CREDENTIAL_PATTERN.match(auth):
            return Response('Invalid credentials', 401, {'WWW-Authenticate': 'Basic'})
        return view_func(*args, **kwargs)
    return wrapper

@app.route('/admin')
@validate_auth
def admin():
    return 'Admin Area'

The pattern uses nested quantifiers and optional groups without atomic boundaries, which can lead to exponential backtracking on crafted input. middleBrick’s unauthenticated scan would flag this as a high-severity finding under its Input Validation and Regex DoS checks, providing remediation guidance to simplify the regex, use non-capturing groups, and enforce strict length and character class limits.

Basic Auth-Specific Remediation in Flask — concrete code fixes

Mitigating Regex Dos in Flask with Basic Auth requires simplifying and anchoring regular expressions, avoiding nested quantifiers, and validating input length and character sets precisely. Prefer explicit parsing over complex regex when possible, and enforce strict constraints on the username and password components. Below are concrete, secure patterns you can adopt in your Flask applications.

First, use a tightly scoped regex that anchors the pattern and avoids repetition-induced backtracking. For Basic Auth, the credentials are base64-encoded and do not need character classes beyond the base64 alphabet, plus padding. A safer approach is to validate the header format with a simple, non-backtracking pattern and then decode and inspect the payload separately:

import base64
import re
from flask import Flask, request, Response

app = Flask(__name__)

# Secure: anchored, non-backtracking pattern
BASIC_AUTH_PATTERN = re.compile(r'^Basic\s+([A-Za-z0-9+/]+={0,2})$')

def parse_basic_auth(auth_header):
    match = BASIC_AUTH_PATTERN.match(auth_header)
    if not match:
        return None
    encoded = match.group(1)
    try:
        decoded = base64.b64decode(encoded, validate=True).decode('utf-8')
    except Exception:
        return None
    if ':' not in decoded:
        return None
    username, password = decoded.split(':', 1)
    # Enforce reasonable length limits to prevent abuse
    if len(username) > 128 or len(password) > 128:
        return None
    return username, password

def validate_auth(view_func):
    def wrapper(*args, **kwargs):
        auth = request.headers.get('Authorization', '')
        creds = parse_basic_auth(auth)
        if creds is None:
            return Response('Invalid credentials', 401, {'WWW-Authenticate': 'Basic'})
        request.parsed_user, request.parsed_pass = creds
        return view_func(*args, **kwargs)
    return wrapper

@app.route('/admin')
@validate_auth
def admin():
    return 'Admin Area'

This pattern anchors the regex, uses a single quantifier structure, and delegates semantic validation to base64 decoding and simple string checks, eliminating catastrophic backtracking. It also enforces length limits, which further reduces risk of resource exhaustion. For applications using token-based enhancements on top of Basic Auth, combine this with strict allowlists for usernames and reject any characters outside expected sets.

Alternatively, avoid regex entirely for header parsing when possible. Use string operations to check the scheme and then decode the credential portion:

def validate_auth_no_regex(view_func):
    def wrapper(*args, **kwargs):
        auth = request.headers.get('Authorization', '')
        if not auth.startswith('Basic '):
            return Response('Invalid credentials', 401, {'WWW-Authenticate': 'Basic'})
        encoded = auth[6:].strip()
        try:
            decoded = base64.b64decode(encoded, validate=True).decode('utf-8')
        except Exception:
            return Response('Invalid credentials', 401, {'WWW-Authenticate': 'Basic'})
        if ':' not in decoded:
            return Response('Invalid credentials', 401, {'WWW-Authenticate': 'Basic'})
        username, password = decoded.split(':', 1)
        if len(username) > 128 or len(password) > 128:
            return Response('Invalid credentials', 401, {'WWW-Authenticate': 'Basic'})
        request.parsed_user, request.parsed_pass = username, password
        return view_func(*args, **kwargs)
    return wrapper

By combining strict input validation, minimal regex usage, and length constraints, you reduce the likelihood of Regex Dos while maintaining compatibility with Basic Authentication flows. middleBrick’s CLI can be used to verify that your patterns no longer trigger high-severity Input Validation findings, and its GitHub Action can enforce these checks in CI/CD pipelines to prevent regressions.

Related CWEs: inputValidation

CWE IDNameSeverity
CWE-20Improper Input Validation HIGH
CWE-22Path Traversal HIGH
CWE-74Injection CRITICAL
CWE-77Command Injection CRITICAL
CWE-78OS Command Injection CRITICAL
CWE-79Cross-site Scripting (XSS) HIGH
CWE-89SQL Injection CRITICAL
CWE-90LDAP Injection HIGH
CWE-91XML Injection HIGH
CWE-94Code Injection CRITICAL

Frequently Asked Questions

Can Regex Dos occur if the regex pattern is used only for logging or non-security purposes in Flask?
Yes. Any use of a vulnerable regex on attacker-controlled input can cause excessive CPU consumption, regardless of whether the regex guards a security boundary. Limit pattern complexity universally.
Does using a compiled regex in Flask reduce the risk of Regex Dos compared to inline patterns?
Compilation does not prevent catastrophic backtracking; it only occurs at runtime when the regex is matched. Focus on the pattern structure, use atomic groups where supported, and validate input length to reduce risk.