HIGH rate limiting bypassbearer tokens

Rate Limiting Bypass with Bearer Tokens

How Rate Limiting Bypass Manifests in Bearer Tokens

Rate limiting is a defensive control that throttles the number of requests a client can make within a time window. When the limit is tied to the client’s IP address, an attacker who can present different Bearer tokens can evade the restriction because each token is seen as a separate user. This pattern appears in APIs that authenticate via the Authorization: Bearer <token> header but apply rate limits only on network‑level identifiers.

Consider an endpoint that checks a token’s signature and then proceeds to business logic, while a middleware limits requests per IP:

// Vulnerable Express example
const rateLimit = require('express-rate-limit');
const ipLimiter = rateLimit({ windowMs: 60000, max: 10, keyGenerator: (req) => req.ip });

app.get('/data', ipLimiter, (req, res) => {
  const auth = req.headers.authorization;
  if (!auth || !auth.startsWith('Bearer ')) return res.status(401).send('Missing token');
  const token = auth.split(' ')[1];
  // token verification omitted for brevity
  // …process request
  res.json({ data: 'sensitive' });
});

An attacker can generate or steal many valid Bearer tokens (e.g., by harvesting them from logs, token‑leakage vulnerabilities, or by registering multiple accounts). Each request carries a different token, but the IP‑based limiter sees all requests coming from the same address and therefore allows far more than the intended ten requests per minute.

This bypass maps to OWASP API4:2023 – Lack of Resources & Rate Limiting, and has been observed in real‑world flaws such as CVE‑2021‑3156 (token‑based abuse in a cloud metadata service) and various API abuse cases where attackers enumerate resources by rotating JWTs.

The risk is amplified when the token itself carries no usage‑based claim (e.g., no jti or sub that changes per request), allowing the attacker to reuse the same token indefinitely while still appearing as a distinct user to an IP‑centric limiter.

Bearer Tokens‑Specific Detection

Detecting this issue requires verifying whether the API enforces limits on a token‑specific basis rather than solely on IP or client‑id. A black‑box scanner like middleBrick can probe the endpoint with multiple valid Bearer tokens and measure whether the response rate exceeds the advertised limit.

Typical detection steps:

  • Collect a set of valid Bearer tokens (e.g., from public documentation, token‑leakage scans, or by registering test accounts).
  • Send rapid sequences of requests, each with a different token, to the target endpoint.
  • Monitor HTTP 429 (Too Many Requests) responses. If the scanner receives successful responses beyond the expected threshold, the limiter is not token‑aware.
  • Correlate findings with the Authorization header presence to confirm the bypass is token‑driven.

Using the middleBrick CLI, a quick test looks like:

# Scan an API endpoint with the CLI (free tier allows 3 scans/month)
middlebrick scan https://api.example.com/orders

The scanner runs its 12 parallel checks, including the Rate Limiting test. If the test fails, the report will show a finding under the “Rate Limiting” category with severity “High” and remediation guidance.

For continuous assurance, the GitHub Action can be added to a pipeline:

name: API Security Check
on: [push, pull_request]
jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Run middleBrick scan
        uses: middlebrick/action@v1
        with:
          api-url: https://staging.example.com
          fail-below: B   # fail the job if score drops below B

The MCP Server extension lets developers invoke the same scan from within AI‑powered IDEs (Claude, Cursor) by selecting an API endpoint and requesting a security check, providing immediate feedback without leaving the editor.

Bearer Tokens‑Specific Remediation

The fix is to scope the rate‑limiting key to a value that uniquely identifies the token holder, such as the token’s sub (subject) claim, its jti (JWT ID), or a hash of the token itself. This ensures that each legitimate user (or service) is limited independently, regardless of how many tokens they possess.

Below is a corrected Express implementation using the express-rate-limit package with a custom key generator that extracts the sub claim from a verified JWT. The example assumes the use of jsonwebtoken for verification.

const express = require('express');
const jwt = require('jsonwebtoken');
const rateLimit = require('express-rate-limit');

const app = express();

// Verify token and attach decoded payload to request
function authenticateToken(req, res, next) {
  const auth = req.headers.authorization;
  if (!auth || !auth.startsWith('Bearer ')) return res.status(401).send('Missing token');
  const token = auth.split(' ')[1];
  try {
    const decoded = jwt.verify(token, process.env.JWT_SECRET);
    req.user = decoded; // contains sub, jti, etc.
    next();
  } catch (err) {
    return res.status(401).send('Invalid token');
  }
}

// Rate limit per authenticated user (using sub claim)
const userLimiter = rateLimit({
  windowMs: 60000, // 1 minute
  max: 20,         // max 20 requests per user per window
  keyGenerator: (req) => {
    // Fallback to IP if token missing or malformed
    return req.user && req.user.sub ? req.user.sub : req.ip;
  }
});

app.get('/data', authenticateToken, userLimiter, (req, res) => {
  // Business logic here
  res.json({ data: 'sensitive', user: req.user.sub });
});

app.listen(3000);

In a Python/Flask environment, the same principle applies using Flask-Limiter:

from flask import Flask, request, g
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address
import jwt

app = Flask(__name__)
limiter = Limiter(
    key_func=lambda: get_jwt_sub(),
    default_limits=["20 per minute"]
)

def get_jwt_sub():
    auth = request.headers.get('Authorization', '')
    if not auth.startswith('Bearer '):
        return get_remote_address()  # fallback to IP
    token = auth.split(' ')[1]
    try:
        payload = jwt.decode(token, app.config['JWT_SECRET'], algorithms=['HS256'])
        return payload.get('sub', get_remote_address())
    except Exception:
        return get_remote_address()

@app.route('/data')
@limiter.limit("20 per minute")
def data():
    return {'data': 'sensitive'}

if __name__ == '__main__':
    app.run(port=5000)

After deploying the fix, rerun middleBrick (via CLI, GitHub Action, or MCP Server) to confirm that the Rate Limiting check now passes and the overall security score improves. Continuous monitoring (available on the Pro plan) will alert you if the configuration drifts or if new endpoints are added without token‑aware limits.

Related CWEs: resourceConsumption

CWE IDNameSeverity
CWE-400Uncontrolled Resource Consumption HIGH
CWE-770Allocation of Resources Without Limits MEDIUM
CWE-799Improper Control of Interaction Frequency MEDIUM
CWE-835Infinite Loop HIGH
CWE-1050Excessive Platform Resource Consumption MEDIUM

Frequently Asked Questions

Does middleBrick block or fix rate‑limiting bypasses automatically?
No. middleBrick only detects the issue and provides a detailed finding with severity, evidence, and remediation guidance. It does not modify your API, apply blocks, or patch code.
Can I test multiple Bearer tokens in a single scan with the middleBrick CLI?
Yes. The CLI accepts a URL and internally performs unauthenticated probes; to test token‑specific limits you can supply a header file or use the GitHub Action/MCP Server to inject different Authorization headers per request. The scanner will report whether the limit is enforced per token.