Hallucination Attacks in Django with Dynamodb
Hallucination Attacks in Django with Dynamodb — how this specific combination creates or exposes the vulnerability
A hallucination attack in the context of an API security scan refers to the API returning information that did not originate from the underlying data store or is inconsistent with the expected schema. When a Django application uses Amazon DynamoDB as its persistence layer, the risk of hallucination arises from DynamoDB’s schema-less design and the way Django ORM–style abstractions can misinterpret or over-interpret query results.
DynamoDB stores items as attribute-value pairs with minimal schema enforcement. If a Django layer does not strictly validate or map responses to a defined model shape, an attacker can supply crafted inputs that cause the service to “invent” plausible but false data. For example, incomplete filter expressions or conditional checks may cause the backend to return partial items; the application might then fill missing fields with defaults or inferred values, effectively hallucinating data. This is particularly relevant when scanning with middleBrick, which tests input validation and data exposure across 12 parallel checks, including Property Authorization and Input Validation.
In a black-box scan, middleBrick evaluates whether the API reflects, distorts, or fabricates data. With DynamoDB, a common pattern is to use scan or query operations with a projection expression that omits certain attributes. If the Django code assumes all expected fields are present and synthesizes values for absent ones, the API can leak internal defaults or configuration that do not correspond to stored records. This can expose sensitive metadata or enable an attacker to infer internal logic, contributing to findings in the Data Exposure and Input Validation categories.
Additionally, DynamoDB’s sparse index behavior can cause hallucination when a query targets a Global Secondary Index that does not contain all item attributes. If the Django layer merges results from multiple index queries without verifying source consistency, it may produce a composite object that never existed in the database. middleByte’s OpenAPI/Swagger analysis, with full $ref resolution, cross-references spec definitions with runtime findings to highlight such inconsistencies.
Dynamodb-Specific Remediation in Django — concrete code fixes
To mitigate hallucination risks, enforce strict schema validation and avoid synthesizing data when DynamoDB responses are incomplete. Below are concrete, realistic code examples for a Django-based service that uses boto3 to interact with DynamoDB.
1. Define a strict model and validate responses
Create a dataclass or Pydantic model that represents the expected shape. Only populate fields that are explicitly present in the DynamoDB item.
from dataclasses import dataclass
import boto3
from typing import Optional
@dataclass
class Product:
sku: str
name: str
price_cents: int
description: Optional[str] = None
def load_product(sku: str) -> Product:
client = boto3.client('dynamodb', region_name='us-east-1')
response = client.get_item(
TableName='Products',
Key={'sku': {'S': sku}}
)
item = response.get('Item')
if item is None:
raise ValueError('Product not found')
# Only use fields that exist; do not invent values
return Product(
sku=item['sku']['S'],
name=item['name']['S'],
price_cents=int(item['price_cents']['N']),
description=item.get('description', {}).get('S')
)
2. Use explicit projection expressions and reject partial data
When querying, specify the exact attributes you need and reject responses that do not contain all required fields.
def list_active_products() -> list[Product]:
client = boto3.client('dynamodb', region_name='us-east-1')
response = client.scan(
TableName='Products',
FilterExpression='attribute_exists(status) AND status = :active',
ExpressionAttributeValues={':active': {'S': 'active'}},
ProjectionExpression='sku, name, price_cents, description'
)
products = []
for item in response.get('Items', []):
# Ensure all required keys are present; skip malformed items
if not all(k in item for k in ('sku', 'name', 'price_cents')):
continue
products.append(Product(
sku=item['sku']['S'],
name=item['name']['S'],
price_cents=int(item['price_cents']['N']),
description=item.get('description', {}).get('S')
))
return products
3. Avoid merging across indexes unless necessary
If you must use a GSI, keep the source primary key and do not fabricate missing attributes. Return what DynamoDB provides and let the client handle missing fields.
def search_by_category(category: str) -> list[dict]:
client = boto3.client('dynamodb', region_name='us-east-1')
resp = client.query(
TableName='Products',
IndexName='category-index',
KeyConditionExpression='category = :cat',
ExpressionAttributeValues={':cat': {'S': category}},
ProjectionExpression='sku, name'
)
# Do not add default prices or descriptions not returned by DynamoDB
return resp.get('Items', [])
4. Harden input validation to prevent injection of malformed requests
Validate and sanitize all incoming identifiers and expressions before constructing DynamoDB requests.
import re
def safe_sku(sku: str) -> str:
if not re.fullmatch(r'^[A-Z0-9\-]{3,30}', sku):
raise ValueError('Invalid SKU format')
return sku
By combining strict schema definitions, explicit projections, and input validation, you reduce the surface for hallucination attacks. middleBrick’s checks for Property Authorization and Input Validation help verify that such controls are in place.
Related CWEs: llmSecurity
| CWE ID | Name | Severity |
|---|---|---|
| CWE-754 | Improper Check for Unusual or Exceptional Conditions | MEDIUM |