HIGH llm data leakagehapibasic auth

Llm Data Leakage in Hapi with Basic Auth

Llm Data Leakage in Hapi with Basic Auth — how this specific combination creates or exposes the vulnerability

When an Hapi API uses HTTP Basic Authentication and exposes an endpoint that returns or processes unstructured text—such as user-generated content, support transcripts, or debug responses—it can unintentionally leak sensitive information to an LLM-based client or logging layer. In Hapi, route handlers often assemble response payloads from multiple sources (database rows, configuration, request state), and if any of those sources contain credentials, PII, or secrets, that data may be included verbatim in replies processed by an LLM integration.

With Basic Auth, credentials are sent in an Authorization header that is base64-encoded but not encrypted. If the server logs the authorization header, echoes it in error messages, or includes it in an LLM tool call or response, the credentials are effectively exposed in clear text to anyone who can intercept or access those logs. middleBrick’s LLM/AI Security checks detect System Prompt Leakage patterns that commonly appear in Hapi responses when debug strings include tokens or user identifiers alongside model context.

Consider a Hapi route that builds a completion request for an LLM using user input and appends internal notes or session identifiers to the prompt. If those notes contain Basic Auth–derived user IDs or roles, the next LLM call might surface them in model output, especially when the model is asked to summarize or rephrase. middleBrick’s Active Prompt Injection testing probes verify whether an attacker can coax the model into repeating credentials, and Output Scanning checks whether replies contain API keys or PII. Because Hapi often serves as a backend for richer client applications, the framework’s permissive plugin ecosystem can inadvertently forward sensitive headers or route parameters into contexts where an LLM can observe them.

Another vector arises from introspection or inventory endpoints that return structured metadata about the API itself. If these endpoints include authorization headers or user roles in their JSON representation and that JSON is passed to an LLM for analysis, the LLM can extract or repeat those values. middleBrick’s Excessive Agency detection looks for patterns such as tool_calls or function_call that may propagate sensitive context into model interactions. By correlating OpenAPI/Swagger definitions with runtime behavior, the scanner identifies endpoints where auth-derived data flows into responses that could be consumed by AI models.

In practice, this means an Hapi service with Basic Auth can leak credentials or PII not through the auth mechanism itself, but through how route handlers compose and forward data to downstream AI services. The risk is elevated when developers assume that because credentials are base64-encoded in transit, they are safe to embed in logs, error objects, or LLM prompts. middleBrick flags these scenarios as LLM Data Leakage findings and provides remediation guidance that focuses on data minimization, strict header handling, and output validation before any content reaches an LLM.

Basic Auth-Specific Remediation in Hapi — concrete code fixes

To prevent LLM Data Leakage in Hapi when using Basic Authentication, you must ensure that sensitive authorization data never reaches response bodies, logs, or LLM prompts. This involves validating and sanitizing headers, avoiding the inclusion of auth-derived fields in user-facing payloads, and configuring route options to minimize accidental exposure.

Example 1: Strict validation and selective response assembly

Instead of forwarding the entire request.auth object, explicitly pick only the fields you need and omit credentials:

const Hapi = require('@hapi/hapi');

const init = async () => {
  const server = Hapi.server({ port: 4000, host: 'localhost' });

  server.auth.strategy('simple', 'basic', {
    validate: (request, username, password, h) =>
      username === 'alice' && password === 's3cret' ? { credentials: { id: 1, role: 'user' } } : null
  });

  server.route({
    method: 'POST',
    path: '/complete',
    options: {
      auth: 'simple',
      handler: (request, h) => {
        const { payload } = request;
        const user = request.auth.credentials;

        // Build a sanitized context for the LLM, excluding credentials
        const context = {
          userId: user.id,
          role: user.role,
          input: payload.text
        };

        // Do NOT include request.auth directly in the response or LLM prompt
        return { completion: `Processed as ${context.role}` };
      }
    }
  });

  await server.start();
};

init().catch(err => console.error(err));

Example 2: Centralized response sanitization plugin

Use a Hapi response toolkit to strip sensitive headers and fields before any data is returned or logged:

const sanitize = (payload, auth) => {
  const { credentials, ...safeAuth } = auth || {};
  // Remove any auth-derived keys from payload if present
  if (payload && typeof payload === 'object') {
    const cleaned = { ...payload };
    delete cleaned.authToken;
    delete cleaned.sessionId;
    return cleaned;
  }
  return payload;
};

server.ext('onPreResponse', (request, h) => {
  const response = request.response;
  if (response.variety === 'view' || (response.variety === 'response' && response.source && typeof response.source.payload === 'object')) {
    const payload = response.source.payload || response.source;
    response.variety = 'response';
    response.response = sanitize(payload, request.auth.credentials);
  }
  return h.continue;
});

General practices

  • Never log request.auth or any object that contains credentials.
  • Configure CORS to avoid leaking headers that include authorization context to browser-based LLM tools.
  • Use schema validation (e.g., Joi) to ensure outbound payloads do not contain unexpected auth-derived properties.
  • Apply the same sanitization logic to error handlers to prevent credentials from appearing in stack traces returned to clients.

By following these patterns, you reduce the surface area for LLM Data Leakage while still allowing legitimate user context to inform AI interactions. middleBrick’s scans can verify that remediation has been applied by checking that authentication-derived fields are absent from documented response schemas and that runtime outputs pass Output Scanning for PII and secrets.

Related CWEs: llmSecurity

CWE IDNameSeverity
CWE-754Improper Check for Unusual or Exceptional Conditions MEDIUM

Frequently Asked Questions

Can an LLM reconstruct credentials from a Basic Auth–protected endpoint that returns user metadata?
Yes, if the endpoint includes authorization-derived fields (such as user ID, role, or custom claims) in its response and that response is supplied to an LLM, the model can repeat or infer credentials. Mitigate by excluding auth-derived fields from responses and by validating outputs before any LLM consumption.
Does enabling TLS fully protect Basic Auth headers from being exposed to LLMs?
TLS protects credentials in transit between client and server, but it does not prevent the server from logging or forwarding those credentials to downstream services or LLMs. You must still sanitize headers and avoid embedding auth context in data sent to LLM endpoints.