HIGH llm data leakagehapicockroachdb

Llm Data Leakage in Hapi with Cockroachdb

Llm Data Leakage in Hapi with Cockroachdb — how this specific combination creates or exposes the vulnerability

When an Hapi server uses CockroachDB as its backend and exposes endpoints that return LLM-related data, there is a risk of unintended leakage of sensitive or system information through LLM responses. This can occur when application logic passes database records or query results directly into prompts or tool outputs without adequate filtering or redaction. In a typical Hapi route that interacts with CockroachDB, a developer might construct a response or a prompt using user-supplied identifiers or database fields, inadvertently including sensitive data such as API keys, personal identifiers, or internal tool usage patterns.

Consider an Hapi route that retrieves a user’s configuration from CockroachDB and uses it to customize an LLM prompt. If the configuration includes fields like system_prompt, tools, or metadata about previous interactions, and these are concatenated into a prompt without validation, an attacker could exploit prompt injection techniques to extract those values. For example, a malicious user might supply an input designed to trigger verbose error messages or influence the LLM to repeat sensitive context from the database. Because CockroachDB stores structured data, including JSONB columns that may contain model settings or historical tool calls, improper handling can expose patterns that reveal how the LLM is being steered or constrained.

The LLM/AI Security checks run by middleBrick specifically target this class of risk by detecting system prompt leakage and testing for prompt injection in scenarios where backend data influences LLM inputs. In a Hapi + CockroachDB setup, if database fields are used to assemble prompts, tool definitions, or system instructions without sanitization, the scanner can identify whether crafted inputs cause the model to reveal those internal instructions or data. This is especially relevant when the API exposes endpoints that return raw configuration or logs that include tool usage, function call definitions, or agent state patterns that should remain internal.

Additionally, output scanning for PII, API keys, and executable code is critical when LLM responses are generated from data stored in CockroachDB. If the database contains sensitive values that are echoed or referenced in LLM outputs—such as user identifiers, session tokens, or generated code snippets—without proper masking, the API may leak credentials or private information. The scanner checks for these patterns in LLM responses and flags instances where structured data from CockroachDB appears in model output without redaction.

Excessive agency detection is another relevant concern. If Hapi routes dynamically construct tool calls or function_call instructions based on CockroachDB records—such as enabling or disabling tools depending on feature flags stored in the database—an attacker might manipulate those flags to induce the LLM to use unauthorized tools or escalate privileges. By analyzing how database state influences LLM behavior, middleBrick can identify whether the API exposes pathways for unintended tool usage or agent-like behavior.

Cockroachdb-Specific Remediation in Hapi — concrete code fixes

To mitigate LLM data leakage in a Hapi application using CockroachDB, apply strict separation between data retrieval, prompt construction, and LLM interaction. Always treat database content as potentially sensitive and never directly concatenate raw rows or configuration fields into prompts or tool definitions. Use explicit allowlists and output encoding to ensure only intended data flows into the LLM context.

Below are concrete, syntactically correct examples for Hapi routes that safely interact with CockroachDB.

1. Safe data retrieval and prompt construction

// server.js
const Hapi = require('@hapi/hapi');
const { Pool } = require('pg');

const pool = new Pool({
  connectionString: 'postgresql://user:password@host:26257/db?sslmode=require',
});

const init = async () => {
  const server = Hapi.server({ port: 3000, host: 'localhost' });

  server.route({
    method: 'GET',
    path: '/api/recommendation',
    handler: async (request, h) => {
      const userId = request.query.userId;
      // Retrieve only necessary, non-sensitive fields
      const result = await pool.query(
        'SELECT display_name, preferences FROM users WHERE id = $1',
        [userId]
      );

      if (result.rows.length === 0) {
        return h.response({ error: 'User not found' }).code(404);
      }

      const user = result.rows[0];
      // Build prompt using safe, curated data only
      const prompt = `Provide a recommendation for ${user.display_name} based on their preferences: ${user.preferences}. Do not include any internal system details.`;

      // Call LLM with safe prompt
      const llmResponse = await callLLM(prompt);
      return { recommendation: llmResponse };
    },
  });

  await server.start();
  console.log('Server running on %s', server.info.uri);
};

const callLLM = async (prompt) => {
  // Placeholder for actual LLM call
  return 'Generated response';
};

init();

2. Avoiding leakage of internal fields

// BAD: Includes internal fields that could leak system prompts or tool definitions
const unsafeRoute = {
  method: 'GET',
  path: '/api/config/{id}',
  handler: async (request, h) => {
    const result = await pool.query('SELECT * FROM configs WHERE id = $1', [request.params.id]);
    // DO NOT DO THIS: exposes internal fields like system_prompt, tools, flags
    return result.rows[0];
  },
};

// GOOD: Explicitly select safe fields and redact sensitive content
const safeRoute = {
  method: 'GET',
  path: '/api/config/{id}',
  handler: async (request, h) => {
    const result = await pool.query(
      'SELECT name, is_active FROM public_configs WHERE id = $1',
      [request.params.id]
    );
    return result.rows[0] || { error: 'Not found' };
  },
};

3. Mitigating prompt injection via input validation

const sanitizeInput = (input) => {
  // Remove or escape sequences commonly used in prompt injection
  return input.replace(/(\b(system|assistant|user|role)\b|:)/g, '[REDACTED]');
};

server.route({
  method: 'POST',
  path: '/api/chat',
  handler: async (request, h) => {
    const safeUserInput = sanitizeInput(request.payload.message);
    const prompt = `User: ${safeUserInput}. Respond concisely.`;
    return { reply: await callLLM(prompt) };
  },
});

4. Securing LLM outputs that include database-derived content

const filterOutput = (text) => {
  // Basic pattern to remove potential API keys, emails, or internal IDs
  return text
    .replace(/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g, '[EMAIL]')
    .replace(/\bsk_live_[a-zA-Z0-9]{20,}\b/g, '[API_KEY]')
    .replace(/\brole":"[^"]*"/g, '[REDACTED]');
};

const postLLMHandler = async (request, h) => {
  const raw = await callLLM(request.payload.prompt);
  const safe = filterOutput(raw);
  return { response: safe };
};

5. Controlling tool and agent behavior sourced from DB flags

// Ensure tool availability flags are server-side validated, not driven by user input
const getEnabledTools = async () => {
  const result = await pool.query(
    'SELECT tool_name FROM enabled_tools WHERE source = $1',
    ['server_config']
  );
  return result.rows.map((r) => r.tool_name);
};

// Do not allow the client to dictate which tools are enabled
server.route({
  method: 'POST',
  path: '/api/agent',
  handler: async (request, h) => {
    const allowed = await getEnabledTools();
    // Use allowed tools only; ignore request.payload.tools
    return { tools: allowed };
  },
});

By combining these patterns—selective field retrieval, input sanitization, output filtering, and server-side control of tool definitions—you reduce the surface for LLM data leakage when Hapi serves data from CockroachDB. These practices align with the detection focus of middleBrick’s LLM/AI Security checks, which scan for system prompt exposure, prompt injection success, and sensitive content in model responses.

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

How does middleBrick detect LLM data leakage in Hapi APIs using CockroachDB?
middleBrick runs LLM/AI Security checks that include system prompt leakage detection, prompt injection testing, and output scanning for PII, API keys, and code. When database-driven prompts or configurations are improperly exposed, these checks identify whether crafted inputs can cause the model to reveal internal instructions or sensitive data stored in CockroachDB.
Can I rely on middleBrick to fix LLM data leakage issues in my Hapi + CockroachDB API?
middleBrick detects and reports findings with remediation guidance, but it does not fix, patch, block, or remediate. You should apply secure coding practices—such as selecting only safe fields, sanitizing inputs, and filtering LLM outputs—to address the issues identified by the scanner.