HIGH llm data leakagefiberdynamodb

Llm Data Leakage in Fiber with Dynamodb

Llm Data Leakage in Fiber with Dynamodb — how this specific combination creates or exposes the vulnerability

When an API built with Fiber exposes an endpoint that queries DynamoDB and returns data used to prompt an LLM, several distinct leakage channels can emerge. In this context, Llm Data Leakage refers to unintended exposure of sensitive information through LLM responses, and the DynamoDB layer can be a source of that sensitive data.

Consider a typical pattern: a Fiber route accepts a user identifier, fetches a record from DynamoDB, and includes fields such as email, internal notes, or personal tags in a prompt sent to an LLM. If the LLM response is returned to the client, any sensitive data ingested by the LLM may appear in the output, either through memorization or via crafted extraction attacks.

DynamoDB-specific risks arise from the structure and richness of stored items. A DynamoDB table might contain fine-grained attributes (e.g., userEmail, ssn, medicalNotes) that are not intended for LLM consumption but are included in the prompt construction due to overly broad attribute selection. In a NoSQL design, it is common to store denormalized or verbose data in a single item; if the application copies entire item attributes into prompts, the LLM can inadvertently surface PII or secrets.

Another vector involves error handling and logging. If a Fiber application logs raw DynamoDB responses (including attribute values) and those logs are later ingested or exposed—accidentally surfaced in an LLM response through training data contamination or via an output-scanning oversight—the sensitive DynamoDB content can leak. The LLM security check for PII, API keys, and executable code in LLM responses is designed to catch these exfiltrated artifacts, but prevention starts with limiting what reaches the LLM.

Middleware or route handlers in Fiber that construct prompt strings by concatenating user-supplied input with DynamoDB fields are especially prone to creating prompt-injection-style leakage, where an attacker manipulates input to coax the LLM into revealing stored data. For example, an attacker might submit crafted text intended to trigger the LLM to regurgitate other attribute values retrieved from DynamoDB. Because the LLM operates over the prompt it receives, any sensitive DynamoDB data included there becomes part of the model context and may appear in completions.

middleBrick’s LLM/AI Security checks address this specific combination by scanning for System Prompt Leakage using regex patterns tailored to ChatML, Llama 2, Mistral, and Alpaca formats, testing for prompt injection through sequential probes (system prompt extraction, instruction override, DAN jailbreak, data exfiltration, and cost exploitation), and scanning LLM outputs for PII, API keys, and executable code. When integrated via the CLI (middlebrick scan <url>) or through the GitHub Action for CI/CD gates, these checks help identify whether DynamoDB-sourced data is reaching the LLM surface.

In practice, to mitigate leakage you should minimize the data flow from DynamoDB into LLM prompts, apply strict output scanning, and enforce schema-based access controls so that only necessary, non-sensitive fields are considered for prompt construction. The scanner’s findings map to OWASP API Top 10 and can be integrated into your dashboard to track API security scores over time, ensuring that risky data pathways involving DynamoDB and LLMs are surfaced early.

Dynamodb-Specific Remediation in Fiber — concrete code fixes

Remediation focuses on controlling which DynamoDB attributes are used in LLM interactions, validating and sanitizing inputs, and ensuring that sensitive fields never reach the prompt or the LLM response. Below are concrete, Fiber-oriented patterns with valid DynamoDB SDK usage in Go.

1. Select only safe, non-sensitive attributes when building prompts

Instead of copying the entire DynamoDB item, explicitly pick fields that are safe for LLM consumption.

// Example: Fiber handler that selects specific attributes
package main

import (
    "github.com/gofiber/fiber/v2"
    "github.com/aws/aws-sdk-go-v2/service/dynamodb"
    "github.com/aws/aws-sdk-go-v2/service/dynamodb/types"
    "context"
)

type UserProfile struct {
    UserID   string
    Username string
    Role     string
    // Do NOT include Email, SSN, or internal notes here
}

func getUserProfile(client *dynamodb.Client, tableName string, userID string) (UserProfile, error) {
    out, err := client.GetItem(context.TODO(), &dynamodb.GetItemInput{
        TableName: &tableName,
        Key: map[string]types.AttributeValue{
            "userID": &types.AttributeValueMemberS{Value: userID},
        },
    })
    if err != nil {
        return UserProfile{}, err
    }
    item := out.Item
    if item == nil {
        return UserProfile{}, nil
    }
    return UserProfile{
        UserID:   getString(item, "userID"),
        Username: getString(item, "username"),
        Role:     getString(item, "role"),
    }, nil
}

func getString(item map[string]types.AttributeValue, key string) string {
    if v, ok := item[key]; ok {
        return *v.(*types.AttributeValueMemberS).Value
    }
    return ""
}

func main() {
    app := fiber.New()
    client := &dynamodb.Client{} // configured client

    app.Get("/profile/:id", func(c *fiber.Ctx) error {
        profile, err := getUserProfile(client, "Users", c.Params("id"))
        if err != nil {
            return c.Status(500).SendString("server error")
        }
        // Build prompt using only safe fields
        prompt := "User " + profile.Username + " (role: " + profile.Role + ") needs assistance."
        // send prompt to LLM here
        c.SendString(prompt)
        return nil
    })
}

2. Validate and sanitize inputs before querying DynamoDB

Ensure IDs and query parameters conform to expected formats to prevent injection or over-fetching.

// Validate userID format before using in DynamoDB request
import "regexp"

var userIDRegex = regexp.MustCompile(`^[a-zA-Z0-9_-]{1,64}$`)

func validateUserID(userID string) bool {
    return userIDRegex.MatchString(userID)
}

// In handler:
if !validateUserID(c.Params("id")) {
    return c.Status(400).SendString("invalid user ID")
}

3. Redact sensitive fields before any logging or LLM consumption

If you must log for debugging, explicitly exclude sensitive attributes.

// Example: filter out sensitive keys before logging
func safeLogItem(item map[string]types.AttributeValue) map[string]interface{} {
    safe := make(map[string]interface{})
    for k, v := range item {
        switch k {
        case "email", "ssn", "medicalNotes", "apiKey":
            safe[k] = "[REDACTED]"
        default:
            if sv, ok := v.(*types.AttributeValueMemberS); ok {
                safe[k] = *sv.Value
            } else if bv, ok := v.(*types.AttributeValueMemberBOOL); ok {
                safe[k] = *bv.Value
            } else {
                safe[k] = "[UNSUPPORTED]"
            }
        }
    }
    return safe
}

4. Use IAM and fine-grained permissions to limit exposure

Configure DynamoDB resource policies and IAM roles so the Fiber service principal can only read non-sensitive attributes via conditions where possible. While this is an infrastructure control, it reduces the attack surface visible to the application code.

5. Apply output scanning and guardrails before LLM invocation

Treat any data retrieved from DynamoDB as potentially sensitive. If you must include dynamic content in prompts, run it through a PII/API-key scanner or enforce a strict allowlist of field names. middleBrick’s LLM/AI Security checks can be integrated into your pipeline to scan prompts and responses for PII and secrets, ensuring that leaked DynamoDB data is detected before it causes harm.

By combining selective attribute retrieval, strict input validation, redaction in logs, and output scanning, you can significantly reduce the risk of Llm Data Leakage when Fiber services interact with DynamoDB.

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

Can LLM output contain sensitive DynamoDB data even if the prompt was sanitized?
Yes. If the LLM has been trained on or exposed to similar data patterns, it may generate or regurgitate sensitive information. Always scan LLM outputs for PII, API keys, and code, and avoid feeding raw DynamoDB records into prompts.
Does middleBrick automatically fix DynamoDB-related LLM leakage issues?
No. middleBrick detects and reports findings with remediation guidance, but it does not fix, patch, block, or remediate. Use its findings to adjust prompt construction, access patterns, and output validation.