Log Injection in Fastapi with Dynamodb
Log Injection in Fastapi with Dynamodb — how this specific combination creates or exposes the vulnerability
Log injection occurs when untrusted input is written directly into application logs without sanitization, allowing an attacker to forge log entries or inject malicious content such as newlines or structured log delimiters. In a Fastapi application that uses DynamoDB as a data store, this risk arises when request data—such as user IDs, query parameters, or request bodies—is logged verbatim and later stored or emitted to centralized logging systems. If the application logs DynamoDB request metadata (e.g., table names, key conditions, or response summaries) and that data contains newline characters or log-formatting characters, an attacker can manipulate log lines to obscure real events or inject fake entries.
Consider a Fastapi endpoint that retrieves a user profile by user_id and logs the DynamoDB query parameters before execution. If user_id is sourced from an unvalidated client input and directly interpolated into a log message, an attacker can supply a payload like [email protected]\nAWS4-HMAC-SHA256 Credential=.... The injected newline can split a single logical log entry into multiple lines, misleading monitoring tools or security analysts. In DynamoDB-specific contexts, additional risk arises when application logs include the raw key condition expressions or scan filters; an attacker-supplied value can introduce crafted strings that resemble legitimate DynamoDB condition syntax, complicating log parsing and incident response.
Compounded by the structured nature of DynamoDB responses, log injection can also occur when logging items retrieved from a table. If a response field such as username or email contains newline or control characters and is logged without sanitization, downstream log aggregation tools may misinterpret record boundaries. This is particularly relevant when logs are used for audit trails or compliance reporting, as forged entries can reduce trust in security investigations. Because Fastapi applications often integrate structured logging (e.g., JSON logs) and forward them to systems that correlate with DynamoDB trace IDs, the impact of log injection can extend beyond misleading logs to affect observability and alerting pipelines.
To illustrate the interaction, a typical vulnerable Fastapi route using the AWS SDK for DynamoDB might log the input and the constructed key without escaping or validation. The log line may include the user-supplied identifier directly, and if that identifier contains characters that break log format assumptions, the integrity of the log stream is compromised. This does not require breaking DynamoDB operations; the vulnerability exists at the formatting and emission layer, where log producers assume input safety. Therefore, mitigating log injection in this stack requires canonicalizing and sanitizing all data that originates from client-supplied sources before it is recorded in logs, regardless of whether the data is ultimately persisted in DynamoDB.
Dynamodb-Specific Remediation in Fastapi — concrete code fixes
Defensive handling of data that flows into logs—and potentially into DynamoDB operations—starts with strict input validation and structured logging practices. In Fastapi, you should treat all incoming data as untrusted, normalize or reject unexpected control characters, and use structured log fields rather than string interpolation when recording DynamoDB-related context. Below are concrete remediation steps with runnable code examples.
1. Validate and sanitize user input before logging
Normalize identifiers by trimming whitespace and rejecting control characters. Use Pydantic models to enforce allowed character sets and raise explicit errors for malformed input, preventing tainted values from reaching logging code.
from fastapi import Fastapi, HTTPException
import re
import logging
from pydantic import BaseModel, validator
app = Fastapi()
logger = logging.getLogger("dynamodb_app")
class UserLookup(BaseModel):
user_id: str
@validator("user_id")
def validate_user_id(cls, v):
# Allow alphanumeric, underscore, hyphen, and @ for emails, reject newlines
if re.search(r"[\r\n\x00-\x1f]", v):
raise ValueError("user_id contains invalid control characters")
return v.strip()
@app.get("/users/{user_id}")
def get_user(user_id: str):
payload = UserLookup(user_id=user_id)
clean_id = payload.user_id
# safe to log clean_id
logger.info("lookup", extra={"table": "profiles", "user_id": clean_id})
# proceed to DynamoDB call (pseudocode)
# item = dynamodb.get_item(Key={"user_id": clean_id})
return {"user_id": clean_id}
2. Use structured logging with explicit field separation
Emit logs as JSON with clearly defined fields. This prevents newline characters in one field from merging or splitting log records in systems that rely on line-delimited formats. When logging DynamoDB interactions, separate identifiers from the query structure.
import json
import logging
structured_logger = logging.getLogger("structured")
handler = logging.StreamHandler()
structured_logger.addHandler(handler)
structured_logger.setLevel(logging.INFO)
def log_dynamodb_query(table: str, key_condition: str, user_supplied: str):
log_entry = {
"event": "dynamodb_query",
"table": table,
"key_condition": key_condition,
"user_supplied": user_supplied,
# avoid concatenating user_supplied into message text
}
structured_logger.info(json.dumps(log_entry))
# Example usage
log_dynamodb_query(
table="orders",
key_condition="user_id = :uid",
user_supplied="u123"
)
3. Parameterize DynamoDB operations and avoid string interpolation
When constructing requests to DynamoDB, use parameter dictionaries rather than string-based condition expressions that might be logged. This reduces the risk that injected content is misinterpreted as query syntax. If you must log the expression, ensure placeholders are used instead of raw values.
import boto3
from botocore.exceptions import ClientError
dynamodb = boto3.resource("dynamodb", region_name="us-east-1")
table = dynamodb.Table("profiles")
def get_user_safe(user_id: str):
# Use parameterized expression attribute values
response = table.get_item(
Key={"user_id": user_id}
)
item = response.get("Item")
# Log only sanitized metadata, not raw response fields that may contain newlines
logger.info("dynamodb_get", extra={"table": "profiles", "user_id": user_id})
return item
4. Filter or encode output fields that may be logged
If you log items retrieved from DynamoDB, sanitize fields known to contain user-controlled content (e.g., bio text, comments) by stripping non-printable characters or applying a safe encoding before emission.
def sanitize_for_log(value: str) -> str:
# Remove control characters, keep printable ASCII
return re.sub(r"[\x00-\x1f\x7f]", "", value)
# When logging a DynamoDB item
if item:
safe_bio = sanitize_for_log(item.get("bio", ""))
logger.info("item_retrieved", extra={"user_id": item["user_id"], "bio_snippet": safe_bio[:200]})