HIGH llm data leakagehapidynamodb

Llm Data Leakage in Hapi with Dynamodb

Llm Data Leakage in Hapi with Dynamodb — how this specific combination creates or exposes the vulnerability

When an Hapi server uses Amazon DynamoDB as a persistence layer and exposes an endpoint that returns or caches raw database records to an LLM or includes DynamoDB payloads in prompts, sensitive data can leak into model inputs or outputs. This typically occurs when application code passes item attributes, error details, or metadata directly into LLM system prompts, user messages, or tool definitions without redaction or minimization.

DynamoDB items often contain internal fields such as awsRegion, requestId, or partition/sort keys that may reveal infrastructure topology or user identifiers. If an Hapi handler constructs a prompt like System: Use the user profile {JSON.stringify(item)}, any attribute—PII, session tokens, or internal flags—can be exposed to the LLM. In addition, misconfigured responses that include the full DynamoDB Item (including binary attributes or location data) can lead to system prompt leakage or output scanning failures when an LLM processes or echoes the data.

The LLM/AI Security checks in middleBrick specifically look for system prompt leakage patterns (27 regexes covering ChatML, Llama 2, Mistral, Alpaca formats) and active prompt injection probes. When DynamoDB records are reflected into prompts without sanitization, these checks can detect exposed API keys, PII, or credentials in system messages. Output scanning further ensures that LLM responses do not themselves contain DynamoDB–style raw records, API keys, or executable code. Unauthenticated LLM endpoint detection also flags scenarios where an Hapi route serving LLM completions does not enforce authorization, allowing anyone to submit prompts that include leaked DynamoDB content.

Because Hapi does not enforce schema validation by default, developers may inadvertently forward entire DynamoDB items to LLM tooling. For example, using an item as a tool parameter without stripping sensitive metadata can lead to excessive agency detection flags (tool_calls with unexpected fields). Real-world concerns include privilege escalation via manipulated item attributes and data exfiltration through crafted prompts that coax the model to return stored DynamoDB values.

middleBrick’s scans test the unauthenticated attack surface and can surface these risks through findings mapped to OWASP API Top 10 and compliance frameworks. The scanner does not fix the code; it provides prioritized findings with severity and remediation guidance so teams can harden the integration.

Dynamodb-Specific Remediation in Hapi — concrete code fixes

To prevent LLM data leakage with DynamoDB in Hapi, ensure that only necessary, sanitized data reaches prompts and tool definitions. Apply strict field selection, redaction, and input validation before any DynamoDB item is used in LLM interactions.

1. Field selection and redaction

Never pass raw DynamoDB items to LLMs. Instead, project only required fields and remove internal attributes. DynamoDB’s Item uses DynamoDB-specific type wrappers (e.g., S, N, BOOL). Convert these to plain JavaScript values first, then filter sensitive keys.

// utils/sanitize.js
const allowed = new Set(['userId', 'email', 'displayName']);

function sanitizeForLlm(item) {
  // item is a plain object after conversion from DynamoDB format
  const result = {};
  for (const key of Object.keys(item)) {
    if (allowed.has(key)) {
      result[key] = item[key];
    }
  }
  return result;
}

module.exports = { sanitizeForLlm };

2. Safe DynamoDB get/query in a Hapi handler

Use the AWS SDK to fetch items, convert DynamoDB format to plain JS, sanitize, and then use the cleaned data in prompts or tool calls.

// handlers/userProfile.js
const { DynamoDBClient, GetItemCommand } = require('@aws-sdk/client-dynamodb');
const { marshall, unmarshall } = require('@aws-sdk/util-dynamodb');
const { sanitizeForLlm } = require('../utils/sanitize');

const client = new DynamoDBClient({ region: 'us-east-1' });

async function getUserProfile(request, h) {
  const { userId } = request.params;
  const cmd = new GetItemCommand({
    TableName: process.env.PROFILE_TABLE,
    Key: marshall({ userId: userId })
  });

  const { Item } = await client.send(cmd);
  if (!Item) {
    return h.response({ error: 'Not found' }).code(404);
  }

  const profile = unmarshall(Item); // Convert DynamoDB types to plain JS
  return sanitizeForLlm(profile);   // Keep only safe fields
}

module.exports = { getUserProfile };

3. Use sanitized data in prompts and tools

When constructing prompts or tool definitions, reference only the sanitized object. Avoid including the raw DynamoDB response or item metadata.

// handlers/chat.js
const { getUserProfile } = require('./handlers/userProfile');

async function chatHandler(request, h) {
  const profile = await getUserProfile(request, h);
  const prompt = `System: You are assisting user ${profile.email}. Help them with their account.`;
  // Send prompt to LLM; profile contains only email and displayName
  return { prompt };
}

module.exports = { chatHandler };

4. Validate and reject unexpected fields in tool calls

When DynamoDB items are used as tool arguments, validate structure to prevent excessive agency. Reject or ignore properties not explicitly allowed.

// validation/toolValidator.js
function validateToolInput(data, allowedKeys) {
  const keys = Object.keys(data);
  for (const key of keys) {
    if (!allowedKeys.has(key)) {
      throw new Error(`Unexpected field in tool input: ${key}`);
    }
  }
  return true;
}

module.exports = { validateToolInput };

Apply this validation in route setup or within a custom pre-handler in Hapi to ensure only expected fields reach downstream tooling and LLM endpoints.

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

Why does DynamoDB metadata increase LLM data leakage risk in Hapi?
DynamoDB items can contain internal fields such as requestId, awsRegion, or binary attributes. If these are included in prompts or tool calls without sanitization, they may expose infrastructure details or PII to the LLM, leading to system prompt leakage or unintended output.
Does middleBrick fix DynamoDB leakage issues in Hapi?
middleBrick detects and reports these risks, providing severity and remediation guidance. It does not fix code; you must apply field selection, redaction, and validation in your Hapi handlers to prevent data leakage.