HIGH llm data leakageactixdynamodb

Llm Data Leakage in Actix with Dynamodb

Llm Data Leakage in Actix with Dynamodb — how this specific combination creates or exposes the vulnerability

When an Actix web service exposes an unauthenticated endpoint that queries DynamoDB and returns data used to prompt an LLM, the integration can inadvertently leak sensitive information through model outputs. This occurs because the API surface allows an attacker to influence which DynamoDB items are retrieved and how they are formatted for LLM consumption, and because the system may not enforce strict output filtering or schema validation.

In a typical Actix implementation, an HTTP handler deserializes query parameters, builds a DynamoDB condition or key-expression, and fetches items. If the handler embeds retrieved fields directly into prompts—such as user-supplied identifiers, record metadata, or configuration values—an attacker can supply crafted input that causes the service to retrieve and leak unintended data. For example, a missing authorization check on the record ID may allow an attacker to iterate over other users’ items, and those items can appear in LLM responses either through completion text or via tool-calls and function outputs.

The LLM/AI Security checks in middleBrick specifically target this class of risk. System prompt leakage detection uses regex patterns matching ChatML, Llama 2, Mistral, and Alpaca formats to identify prompts that may expose backend logic or data-handling details. Active prompt injection testing probes for system prompt extraction and instruction override, which can reveal cases where user-controlled DynamoDB content affects prompt construction. Output scanning inspects LLM responses for PII, API keys, and executable code; when DynamoDB fields contain secrets or personal data, unchecked exposure in completions becomes a finding. Excessive agency detection flags patterns such as tool_calls or function_call usage that could allow an LLM to trigger additional DynamoDB queries beyond the intended scope. Finally, unauthenticated LLM endpoint detection identifies endpoints that serve completions without requiring authentication, which is especially risky when those endpoints rely on DynamoDB as a data source.

Concrete risk scenarios include: (1) an Actix route that accepts a record_id path parameter, fetches the item from DynamoDB, and inserts it into a prompt without verifying ownership, enabling horizontal privilege escalation and data exfiltration; (2) an endpoint that exposes DynamoDB attribute names or query structures in error messages or LLM outputs, assisting reconnaissance for further attacks; (3) tool-using agents that invoke additional DynamoDB queries based on LLM-generated decisions, expanding the blast radius of a single vulnerable endpoint.

Dynamodb-Specific Remediation in Actix — concrete code fixes

Remediation focuses on strict input validation, least-privilege data retrieval, and separation of data from prompts. In Actix, enforce authentication and scoped authorization before any DynamoDB access, and avoid inserting raw database content into prompts or function arguments. Use parameterized queries and schema-bound models to limit which fields are retrieved and how they are used.

Below are concrete Rust examples using the official AWS SDK for Rust with Actix web. The first example shows an authenticated handler that retrieves a single item by ID with strict field projection and no user-controlled prompt injection.

use actix_web::{web, HttpResponse, Result};
use aws_sdk_dynamodb::Client;
use aws_sdk_dynamodb::types::AttributeValue;

async fn get_user_profile(
    client: web::Data,
    path: web::Path, // user_id from path, already authenticated upstream
) -> Result {
    let user_id = path.into_inner();
    // Enforce authorization at application layer before DB call
    // ensure_requesting_user_owns_profile(&user_id).await?;

    let resp = client
        .get_item()
        .table_name("users")
        .key("user_id", AttributeValue::S(user_id))
        .projection_expression("user_id, display_name, avatar_url")
        .send()
        .await?;

    let item = resp.item().ok_or_else(|| actix_web::error::ErrorNotFound("not found"))?;
    let display_name = item.get("display_name").and_then(AttributeValue::as_s).unwrap_or("unknown");
    let avatar_url = item.get("avatar_url").and_then(AttributeValue::as_s).unwrap_or("");

    // Return data for UI; do NOT embed raw fields into LLM prompts
    Ok(HttpResponse::Ok().json(serde_json::json!({
        "user_id": display_name,
        "avatar_url": avatar_url
    })))
}

The second example demonstrates safe usage when an LLM integration is required: retrieve the data, then pass only sanitized, purpose-limited fields to the prompt template. Never concatenate raw DynamoDB attributes into system or user messages.

fn build_prompt(user_name: &str, avatar_url: &str) -> String {
    // Use strictly controlled template variables; avoid including raw DB schema
    format!(
        "You are assisting user {}. Their avatar is at {}. Provide concise help.",
        sanitize(user_name), avatar_url
    )
}

fn sanitize(input: &str) -> String {
    input.replace(|c: char| !c.is_ascii_alphanumeric() && c != ' ', "_")
}

Additional DynamoDB-specific measures include: enabling encryption at rest via KMS (not modeled in code but required for compliance), using IAM conditions to restrict access by requester identity, and avoiding scan operations that can over-fetch data. Validate and limit result sets server-side, and log access attempts for audit without exposing DynamoDB response content in logs. With these controls, the Actix service can interact with DynamoDB while minimizing data leakage into LLM outputs.

Related CWEs: llmSecurity

CWE ID	Name	Severity
CWE-754	Improper Check for Unusual or Exceptional Conditions	MEDIUM

Frequently Asked Questions

Can an authenticated Actix endpoint still leak data through the LLM if DynamoDB fields contain secrets?

Yes. Even with authentication, if retrieved DynamoDB fields are embedded into prompts or LLM tool outputs without strict schema control and output filtering, secrets or PII can appear in completions. Use field projection, prompt templates with controlled variables, and output scanning to mitigate.

Does enabling encryption at rest in DynamoDB prevent LLM data leakage?

No. Encryption at rest protects data storage, not data in use or exposure through application logic. Secure handling requires authorization checks, input validation, and careful prompt construction to prevent DynamoDB content from reaching LLM outputs.

Llm Data Leakage in Actix with Dynamodb