HIGH excessive data exposureaws

Excessive Data Exposure on Aws

How Excessive Data Exposure Manifests in Aws

Excessive data exposure occurs when an API returns more information than necessary for the legitimate use case. In AWS environments this often shows up through mis‑configured service integrations that leak entire objects, attributes, or diagnostic details.

  • S3 object overload – A Lambda function behind API Gateway calls s3.getObject and returns the raw Body stream to the caller. If the bucket holds large files (logs, backups, database dumps) the entire payload is sent, even when the client only needs a metadata field.
  • DynamoDB scan without projection – An API endpoint executes dynamoDB.scan({ TableName: 'Users' }) and returns every attribute of each item, including password hashes, API keys, or internal identifiers that should never be exposed.
  • API Gateway error leakage – A Lambda integration throws an exception; API Gateway forwards the full error message and stack trace in the HTTP response. Attackers can trigger conditions (e.g., invalid input) to obtain internal library versions, file paths, or AWS request IDs that aid further reconnaissance.
  • Step Functions output passthrough – A state machine returns the raw output of a Parallel state that includes temporary credentials or internal tokens, which are then exposed through a synchronous Express integration.

Real‑world examples include CVE‑2020‑13942 (excessive data exposure via mis‑configured S3 bucket policies) and numerous incident reports where a publicly accessible /admin/users endpoint returned the full DynamoDB item set, revealing hashed passwords and email addresses.

OWASP API Security Top 10 2023 lists this as API3:2023 Excessive Data Exposure, emphasizing that the flaw is a design issue, not a missing patch.

Aws‑Specific Detection

middleBrick performs a black‑box, unauthenticated scan of the target URL and looks for signals that indicate excessive data exposure.

  • It measures the size of the response body and compares it to the expected payload defined in the OpenAPI/Swagger spec (if present). A response that is significantly larger than the documented schema triggers a finding.
  • It parses JSON/XML responses and flags fields that are not declared in the spec but contain sensitive patterns (e.g., strings matching /\b[A-Fa-f0-9]{8}-[A-Fa-f0-9]{4}-[A-Fa-f0-9]{4}-[A-Fa-f0-9]{4}-[A-Fa-f0-9]{12}\b/ for UUIDs, or regexes for AWS access keys AKIA[0-9A-Z]{16}).
  • For endpoints that return binary data (e.g., S3‑served objects), middleBrick checks the Content‑Length header against a threshold derived from the operation’s purpose (e.g., a metadata lookup should return < 1 KB).
  • It also inspects HTTP error responses (4xx/5xx) for stack traces, AWS request IDs, or internal service names that leak implementation details.

Example CLI usage:

# Scan a public API endpoint for excessive data exposure
middlebrick scan https://api.example.com/orders/123

The resulting report includes a finding such as:

Aws‑Specific Remediation

Fixing excessive data exposure in AWS relies on applying the principle of least privilege at the data‑access layer and shaping the response before it leaves the service boundary.

  • S3 – use S3 Select or Lambda filtering – Instead of returning the whole object, retrieve only the needed attributes with S3 Select SQL or read the object in Lambda and strip unwanted fields before sending the response.
  • Example – Node.js (v3 SDK) Lambda that returns only metadata
    import { S3Client, GetObjectCommand } from '@aws-sdk/client-s3';
    const s3 = new S3Client({});
    export const handler = async (event) => {
      const { Bucket, Key } = JSON.parse(event.body);
      const { Body } = await s3.send(new GetObjectCommand({ Bucket, Key }));
      // Assume the object is JSON; parse and keep only needed fields
      const data = JSON.parse(await Body.transformToString());
      const limited = { id: data.id, timestamp: data.timestamp };
      return {
        statusCode: 200,
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify(limited)
      };
    };
    
  • DynamoDB – apply ProjectionExpression – When scanning or querying, request only the attributes required by the API.
  • Example – Python (boto3) with projection
    import boto3
    from boto3.dynamodb.conditions import Key
    
    def get_user(user_id):
        dynamodb = boto3.resource('dynamodb')
        table = dynamodb.Table('Users')
        resp = table.query(
            KeyConditionExpression=Key('pk').eq(f'USER#{user_id}'),
            ProjectionExpression='pk, sk, email, displayName'  # exclude passwordHash, secretKey
        )
        return resp.get('Items', [])
    
  • API Gateway – map or transform responses – Use a mapping template (Velocity) or an HTTP integration response filter to remove internal fields, or configure a Lambda authorizer that strips excess data before returning to the client.
  • IAM – tighten permissions – Ensure the Lambda or EC2 role that calls S3/DynamoDB has only s3:GetObject on the specific prefix and dynamodb:Query with a condition on the projection, not broad s3:* or dynamodb:Scan.
  • Detect drift with AWS Config – Enable the managed rule s3-bucket-public-read-prohibited and dynamodb-table-encryption-enabled to continuously verify that buckets are not inadvertently exposing data and that tables use encryption.
  • Logging and monitoring – Turn on AWS CloudTrail data events for S3 GetObject and DynamoDB ExecuteStatement; set up CloudWatch Metric Filters that alert when the average response size exceeds a defined baseline.

By combining these native controls—selective data retrieval, strict IAM policies, response mapping, and continuous configuration checks—you eliminate the excessive data exposure vector while preserving legitimate API functionality.

Related CWEs: propertyAuthorization

Finding IDSeverityDescriptionRemediation Guidance
EXP‑001HighResponse body 2.3 MB, spec expects ≤ 15 KB; contains fields “passwordHash”, “secretKey”.Limit the returned attributes to those required by the client; use projection or response filtering.
CWE IDNameSeverity
CWE-915Mass Assignment HIGH

Frequently Asked Questions

Does middleBrick need any AWS credentials or agents to scan my APIs?
No. middleBrick works as a black‑box scanner; you only provide the public URL of the endpoint. It performs unauthenticated checks and does not require AWS IAM keys, agents, or any internal access.
How does middleBrick differentiate between a legitimately large payload (e.g., file download) and an excessive data exposure?
When an OpenAPI/Swagger specification is supplied, middleBrick compares the actual response size and structure to the documented schema. If the spec indicates a streaming download (e.g., content‑type application/octet‑stream with a defined size range) the finding is suppressed; otherwise oversized responses that contain undeclared fields are flagged as excessive data exposure.