HIGH hallucination attacksdjangodynamodb

Hallucination Attacks in Django with Dynamodb

Hallucination Attacks in Django with Dynamodb — how this specific combination creates or exposes the vulnerability

A hallucination attack in the context of an API security scan refers to the API returning information that did not originate from the underlying data store or is inconsistent with the expected schema. When a Django application uses Amazon DynamoDB as its persistence layer, the risk of hallucination arises from DynamoDB’s schema-less design and the way Django ORM–style abstractions can misinterpret or over-interpret query results.

DynamoDB stores items as attribute-value pairs with minimal schema enforcement. If a Django layer does not strictly validate or map responses to a defined model shape, an attacker can supply crafted inputs that cause the service to “invent” plausible but false data. For example, incomplete filter expressions or conditional checks may cause the backend to return partial items; the application might then fill missing fields with defaults or inferred values, effectively hallucinating data. This is particularly relevant when scanning with middleBrick, which tests input validation and data exposure across 12 parallel checks, including Property Authorization and Input Validation.

In a black-box scan, middleBrick evaluates whether the API reflects, distorts, or fabricates data. With DynamoDB, a common pattern is to use scan or query operations with a projection expression that omits certain attributes. If the Django code assumes all expected fields are present and synthesizes values for absent ones, the API can leak internal defaults or configuration that do not correspond to stored records. This can expose sensitive metadata or enable an attacker to infer internal logic, contributing to findings in the Data Exposure and Input Validation categories.

Additionally, DynamoDB’s sparse index behavior can cause hallucination when a query targets a Global Secondary Index that does not contain all item attributes. If the Django layer merges results from multiple index queries without verifying source consistency, it may produce a composite object that never existed in the database. middleByte’s OpenAPI/Swagger analysis, with full $ref resolution, cross-references spec definitions with runtime findings to highlight such inconsistencies.

Dynamodb-Specific Remediation in Django — concrete code fixes

To mitigate hallucination risks, enforce strict schema validation and avoid synthesizing data when DynamoDB responses are incomplete. Below are concrete, realistic code examples for a Django-based service that uses boto3 to interact with DynamoDB.

1. Define a strict model and validate responses

Create a dataclass or Pydantic model that represents the expected shape. Only populate fields that are explicitly present in the DynamoDB item.

from dataclasses import dataclass
import boto3
from typing import Optional

@dataclass
class Product:
    sku: str
    name: str
    price_cents: int
    description: Optional[str] = None

def load_product(sku: str) -> Product:
    client = boto3.client('dynamodb', region_name='us-east-1')
    response = client.get_item(
        TableName='Products',
        Key={'sku': {'S': sku}}
    )
    item = response.get('Item')
    if item is None:
        raise ValueError('Product not found')
    # Only use fields that exist; do not invent values
    return Product(
        sku=item['sku']['S'],
        name=item['name']['S'],
        price_cents=int(item['price_cents']['N']),
        description=item.get('description', {}).get('S')
    )

2. Use explicit projection expressions and reject partial data

When querying, specify the exact attributes you need and reject responses that do not contain all required fields.

def list_active_products() -> list[Product]:
    client = boto3.client('dynamodb', region_name='us-east-1')
    response = client.scan(
        TableName='Products',
        FilterExpression='attribute_exists(status) AND status = :active',
        ExpressionAttributeValues={':active': {'S': 'active'}},
        ProjectionExpression='sku, name, price_cents, description'
    )
    products = []
    for item in response.get('Items', []):
        # Ensure all required keys are present; skip malformed items
        if not all(k in item for k in ('sku', 'name', 'price_cents')):
            continue
        products.append(Product(
            sku=item['sku']['S'],
            name=item['name']['S'],
            price_cents=int(item['price_cents']['N']),
            description=item.get('description', {}).get('S')
        ))
    return products

3. Avoid merging across indexes unless necessary

If you must use a GSI, keep the source primary key and do not fabricate missing attributes. Return what DynamoDB provides and let the client handle missing fields.

def search_by_category(category: str) -> list[dict]:
    client = boto3.client('dynamodb', region_name='us-east-1')
    resp = client.query(
        TableName='Products',
        IndexName='category-index',
        KeyConditionExpression='category = :cat',
        ExpressionAttributeValues={':cat': {'S': category}},
        ProjectionExpression='sku, name'
    )
    # Do not add default prices or descriptions not returned by DynamoDB
    return resp.get('Items', [])

4. Harden input validation to prevent injection of malformed requests

Validate and sanitize all incoming identifiers and expressions before constructing DynamoDB requests.

import re

def safe_sku(sku: str) -> str:
    if not re.fullmatch(r'^[A-Z0-9\-]{3,30}', sku):
        raise ValueError('Invalid SKU format')
    return sku

By combining strict schema definitions, explicit projections, and input validation, you reduce the surface for hallucination attacks. middleBrick’s checks for Property Authorization and Input Validation help verify that such controls are in place.

Related CWEs: llmSecurity

CWE ID	Name	Severity
CWE-754	Improper Check for Unusual or Exceptional Conditions	MEDIUM

Frequently Asked Questions

How does middleBrick detect hallucination risks in DynamoDB-backed APIs?

middleBrick runs unauthenticated checks for Input Validation and Property Authorization. It compares the API’s documented schema (OpenAPI/Swagger with full $ref resolution) against runtime responses to identify when data is reflected, missing, or synthesized.

Can middleBrick’s findings be mapped to compliance requirements for DynamoDB APIs?

Yes. middleBrick maps findings to frameworks such as OWASP API Top 10, PCI-DSS, SOC2, HIPAA, and GDPR, helping you assess the compliance implications of hallucination and data exposure risks.

Hallucination Attacks in Django with Dynamodb