Llm Data Leakage in Aspnet with Cockroachdb
Llm Data Leakage in Aspnet with Cockroachdb — how this specific combination creates or exposes the vulnerability
When an ASP.NET application integrates with CockroachDB and exposes an endpoint that returns or processes data intended for large language model (LLM) interactions, the risk of LLM data leakage arises from how conversational outputs are generated, logged, and transmitted. LLM data leakage in this context refers to the unintended exposure of sensitive information—such as database connection strings, row-level data, or query structures—through LLM responses, tool calls, or error messages.
In an ASP.NET environment using CockroachDB, a common pattern involves passing user input or application state to an LLM service, either for generating natural language responses or for driving agentic workflows. If the application does not strictly sanitize inputs, redact sensitive context, and isolate LLM interactions from direct database access, an attacker may infer or extract sensitive data via prompt injection or output analysis. For example, crafted prompts can trick an LLM into echoing database schema details, query results, or internal logic that was inadvertently included in the prompt or tool call construction.
CockroachDB’s distributed SQL nature does not inherently introduce new LLM-specific vulnerabilities, but its role as a backend data source amplifies the impact when leakage occurs. If an LLM endpoint has access to query execution results that contain personally identifiable information (PII), financial data, or authentication tokens, and these results are included in prompts without proper filtering, the LLM may disclose them in its responses. This is particularly relevant when using features such as tool calls or function calling, where structured data from CockroachDB is passed to the LLM for reasoning or further action. An attacker performing LLM output scanning can detect API keys, session tokens, or database identifiers embedded in responses, especially if the application logs full prompt/response pairs for debugging.
Additionally, unauthenticated or improperly scoped LLM endpoints in an ASP.NET application may allow an attacker to probe the system using a series of injection techniques. For instance, a malicious user could submit prompts designed to trigger error messages that reveal stack traces, query text, or CockroachDB node information. The scanner’s LLM security checks—including system prompt leakage detection using 27 regex patterns covering ChatML, Llama 2, Mistral, and Alpaca formats, active prompt injection testing with five sequential probes (system prompt extraction, instruction override, DAN jailbreak, data exfiltration, and cost exploitation), and output scanning for PII, API keys, and executable code—can identify these weaknesses by analyzing the LLM’s behavior across these attack vectors.
In environments where the ASP.NET application uses an OpenAPI specification to document its endpoints, further risks emerge if $ref references pull in database-sensitive definitions that are then exposed through LLM interactions. Cross-referencing spec definitions with runtime findings becomes essential to ensure that LLM-facing operations do not inadvertently describe or return raw database payloads. Without runtime validation and strict input sanitization, the combination of CockroachDB as a data layer and an LLM as an interface creates a pathway for sensitive data to leave the system through conversational outputs, tool calls, or logged traces.
Finally, the risk is compounded when excessive agency patterns are present, such as LangChain agent configurations that allow the LLM to invoke multiple tools or query CockroachDB autonomously. If the LLM is permitted to construct and execute complex queries without tight guardrails, it may return data that was not intended for conversational use. Continuous monitoring and output validation, aligned with compliance frameworks like OWASP API Top 10 and SOC2, are necessary to detect and mitigate LLM data leakage before sensitive information is exposed to unauthorized parties.
Cockroachdb-Specific Remediation in Aspnet — concrete code fixes
To mitigate LLM data leakage in an ASP.NET application using CockroachDB, remediation must focus on input validation, output filtering, and strict separation of data access from conversational interfaces. The following code examples illustrate secure patterns for integrating CockroachDB while minimizing exposure of sensitive information to LLMs.
1. Parameterized Queries with Explicit Column Selection
Always use parameterized queries and select only the necessary columns to avoid exposing schema or sensitive fields to downstream processing.
using System;
using System.Data;
using CockroachDb.PgJson;
using Npgsql;
public class UserRepository
{
private readonly string _connectionString;
public UserRepository(string connectionString) => _connectionString = connectionString;
public User GetUserById(Guid userId)
{
const string sql = "SELECT id, username, email FROM users WHERE id = @userId LIMIT 1";
using var conn = new NpgsqlConnection(_connectionString);
using var cmd = new NpgsqlCommand(sql, conn);
cmd.Parameters.AddWithValue("userId", userId);
conn.Open();
using var reader = cmd.ExecuteReader();
if (reader.Read())
{
return new User
{
Id = reader.GetGuid(0),
Username = reader.GetString(1),
Email = reader.GetString(2)
};
}
return null;
}
}
2. Redaction of Sensitive Fields Before LLM Interaction
Before passing query results to an LLM, remove or mask fields such as emails, API keys, or internal identifiers.
public class DataSanitizer
{
public static string RedactForLLM(string input, string[] sensitivePatterns)
{
foreach (var pattern in sensitivePatterns)
{
input = System.Text.RegularExpressions.Regex.Replace(input, pattern, "[REDACTED]",
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
}
return input;
}
}
// Usage
var sensitive = new[] { @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b", @"\bkey\s*[=:]\s*['\"][^'\"]+['\"]" };
var safePrompt = DataSanitizer.RedactForLLM(rawQueryResult, sensitive);
3. Isolation of LLM Calls from Direct Database Context
Ensure that LLM prompts do not include raw SQL or internal query structures. Use explicit DTOs to transfer only required, non-sensitive data.
public class UserProfileDto
{
public Guid Id { get; set; }
public string Username { get; set; }
}
public UserProfileDto ToDto(User user)
{
return new UserProfileDto { Id = user.Id, Username = user.Username };
}
// Pass only the DTO to the LLM service
var dto = ToDto(user);
var llmResponse = llmService.GenerateResponse(new PromptBuilder()
.AddUserMessage($"Profile: {dto.Username}")
.Build());
4. Secure Configuration of LLM Endpoints
Restrict LLM endpoint access in ASP.NET by validating origins and enforcing authentication where applicable, even if the scan operates unauthenticated. Use middleware to filter suspicious prompts.
app.Use(async (context, next)
{
if (context.Request.Path.StartsWithSegments("/llm"))
{
var prompt = context.Request.Query["prompt"].ToString();
if (prompt.Contains("--") || prompt.Contains("/*") || prompt.Contains("xp_"))
{
context.Response.StatusCode = 400;
await context.Response.WriteAsync("Invalid prompt.");
return;
}
}
await next();
});
By applying these CockroachDB-specific practices within ASP.NET, you reduce the attack surface available for LLM data leakage. Coupled with continuous monitoring using tools such as the middleBrick CLI (middlebrick scan <url>) or GitHub Action integration, you can detect risky prompt patterns and output exposures before they reach production. The MCP server further enables scanning directly from AI coding assistants, helping maintain secure LLM interactions during development.
Related CWEs: llmSecurity
| CWE ID | Name | Severity |
|---|---|---|
| CWE-754 | Improper Check for Unusual or Exceptional Conditions | MEDIUM |
Frequently Asked Questions
How can I test if my ASP.NET endpoint with CockroachDB is vulnerable to LLM data leakage?
middlebrick scan https://your-api.example.com/llm. The scan performs active LLM security probes, including prompt injection and output scanning for PII or API keys, and provides findings with remediation guidance.