HIGH excessive data exposureazure

Excessive Data Exposure on Azure

How Excessive Data Exposure Manifests in Azure

Excessive Data Exposure in Azure environments typically occurs when APIs inadvertently return more data than necessary, often exposing sensitive information through Azure's extensive service ecosystem. This vulnerability manifests in several Azure-specific patterns that developers should recognize.

One common scenario involves Azure Functions returning complete database entities when only partial data is needed. Consider an Azure Function that queries Cosmos DB:

const CosmosClient = require('@azure/cosmos').CosmosClient;
const client = new CosmosClient({ endpoint, key });

module.exports = async function (context, req) {
    const query = `SELECT * FROM c WHERE c.userId = '${req.query.userId}'`;
    const { resources } = await client.database('appdb').container('users').items.query(query).fetchAll();
    context.res = { body: resources };
    return context.res;
};

This pattern returns all user properties, including potentially sensitive fields like ssn, creditCard, or internalNotes that should never be exposed via API.

Azure API Management presents another vector for data exposure. When developers use the send-request policy without proper filtering, entire backend responses can be exposed:

<outbound>
    <send-request mode="new" response-variable-name="user-data" timeout="10" ignore-error="false">
        <set-url>https://backend.azurewebsites.net/api/users/{user-id}</set-url>
        <set-method>GET</set-method>
    </send-request>
    <return-response>
        <set-body>@{(string)context.Variables["user-data"].Body}</set-body>
    </return-response>
</outbound>

Azure App Service and Azure Functions often suffer from improper serialization. When using Entity Framework or similar ORMs with Azure SQL Database, developers might inadvertently serialize navigation properties:

public class Order
{
    public int Id { get; set; }
    public string CustomerName { get; set; }
    public decimal Total { get; set; }
    public ICollection Details { get; set; } // Navigation property
    public Customer Customer { get; set; } // Navigation property
}

Without explicit projection or DTO mapping, this returns the entire object graph including related entities that may contain sensitive information.

Azure Key Vault integration can also lead to data exposure when secrets are improperly handled in API responses. A common anti-pattern:

const keyVault = new KeyVaultSecretClient(vaultUrl, credential);

module.exports = async function (context, req) {
    const secret = await keyVault.getSecret('database-connection-string');
    context.res = { body: { connectionString: secret.value } };
    return context.res;
};

This exposes the actual connection string in the response, potentially allowing attackers to connect directly to backend databases.

Azure-Specific Detection

Detecting Excessive Data Exposure in Azure requires understanding Azure's unique service architecture and scanning methodologies. middleBrick's Azure-specific detection capabilities include several specialized checks.

For Azure Functions, middleBrick analyzes the deployment package to identify potential data exposure patterns. It examines the function's code structure, looking for Cosmos DB queries, SQL operations, and serialization patterns that might return excessive data. The scanner also tests the deployed function endpoint, examining actual responses to identify sensitive fields that shouldn't be exposed.

Azure API Management policies are scanned for improper data handling. middleBrick parses the XML policy definitions, identifying send-request, return-response, and other policies that might expose backend data without proper filtering. It also tests the API endpoints through the management layer to verify what data actually flows through.

middleBrick's OpenAPI/Swagger analysis is particularly valuable for Azure services. When Azure services publish their OpenAPI specifications, middleBrick cross-references the documented responses with actual runtime behavior. This helps identify discrepancies where the spec documents certain fields that shouldn't be exposed, or where undocumented sensitive fields appear in responses.

The scanner specifically looks for Azure service patterns including:

Cosmos DB query patterns that use SELECT * instead of field projection
Entity Framework serialization of navigation properties
Azure Key Vault secret handling in API responses
Azure Blob Storage metadata exposure
Azure Table Storage entity serialization
Azure Service Bus message content exposure

For LLM/AI security in Azure, middleBrick tests Azure OpenAI endpoints for excessive data exposure through system prompt leakage and prompt injection vulnerabilities. It uses 27 regex patterns to detect ChatML, Llama 2, and other format markers that might indicate sensitive system instructions being exposed.

middleBrick's scanning process for Azure services typically completes in 5-15 seconds, testing the unauthenticated attack surface without requiring credentials or configuration. This black-box approach is particularly effective for Azure services where the attack surface is what's exposed to the internet, not internal implementation details.

Azure-Specific Remediation

Remediating Excessive Data Exposure in Azure environments requires leveraging Azure's native features and following Azure-specific best practices. Here are concrete remediation strategies for common Azure patterns.

For Azure Functions with Cosmos DB, implement field projection to return only necessary data:

module.exports = async function (context, req) {
    const query = `SELECT c.id, c.name, c.email FROM c WHERE c.userId = '${req.query.userId}'`;
    const { resources } = await client.database('appdb').container('users').items.query(query).fetchAll();
    context.res = { body: resources.map(user => ({
        id: user.id,
        name: user.name,
        email: user.email
    })) };
    return context.res;
};

This approach explicitly selects only the required fields, preventing exposure of sensitive properties like ssn or internalNotes.

For Azure API Management, implement response filtering policies:

<outbound>
    <send-request mode="new" response-variable-name="user-data" timeout="10" ignore-error="false">
        <set-url>https://backend.azurewebsites.net/api/users/{user-id}</set-url>
        <set-method>GET</set-method>
    </send-request>
    <choose>
        <when condition="@((int)context.Variables["user-data"].StatusCode == 200)">
            <set-body>
                @{
                var body = (JObject)context.Variables["user-data"].Body;
                return new JObject
                {
                    {"id", body["id"]},
                    {"name", body["name"]},
                    {"email", body["email"]}
                };
                }
            </set-body>
        </when>
    </choose>
</outbound>

This policy extracts only specific fields from the backend response before returning it to the client.

For Azure Functions and App Service using Entity Framework, implement DTOs and projection:

public class UserDto
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Email { get; set; }
}

module.exports = async function (context, req) {
    const userId = req.query.userId;
    const user = await dbContext.Users
        .Where(u => u.Id == userId)
        .Select(u => new UserDto {
            Id = u.Id,
            Name = u.Name,
            Email = u.Email
        })
        .FirstOrDefaultAsync();
    
    context.res = { body: user };
    return context.res;
};

This pattern prevents Entity Framework from serializing the entire entity graph, including navigation properties.

For Azure Key Vault integration, never expose secrets in API responses. Instead, use managed identities and environment variables:

const { DefaultAzureCredential } = require('@azure/identity');
const { SecretClient } = require('@azure/keyvault-secrets');

const credential = new DefaultAzureCredential();
const keyVault = new SecretClient(vaultUrl, credential);

module.exports = async function (context, req) {
    // Use the secret internally, never expose it
    const connectionString = await keyVault.getSecret('database-connection-string');
    
    // Perform operations using the connection string
    const result = await performDatabaseOperation(connectionString.value);
    
    context.res = { body: result };
    return context.res;
};

This approach uses Azure Managed Identity for authentication, eliminating the need to store credentials in code or expose them in responses.

For Azure Blob Storage, implement SAS tokens with proper permissions and expiration:

const { BlobServiceClient } = require('@azure/storage-blob');

module.exports = async function (context, req) {
    const blobServiceClient = BlobServiceClient.fromConnectionString(connectionString);
    const containerClient = blobServiceClient.getContainerClient('documents');
    
    // Generate SAS token with read-only permissions and short expiration
    const blobClient = containerClient.getBlobClient(req.query.blobName);
    const sasToken = blobClient.generateSasUrl({
        expiresOn: new Date(Date.now() + 3600000), // 1 hour
        permissions: BlobSASPermissions.parse("r")
    });
    
    context.res = { body: { sasUrl: sasToken } };
    return context.res;
};

This pattern ensures clients can only read specific blobs for a limited time, preventing unauthorized data access.

Related CWEs: propertyAuthorization

CWE ID	Name	Severity
CWE-915	Mass Assignment	HIGH

Frequently Asked Questions

How does middleBrick detect Excessive Data Exposure in Azure Functions?

middleBrick scans Azure Functions by analyzing the deployment package for data exposure patterns, testing the deployed endpoint to examine actual responses, and identifying sensitive fields that shouldn't be exposed. It looks for Cosmos DB queries using SELECT *, Entity Framework serialization of navigation properties, and improper handling of Azure Key Vault secrets in responses.

Can middleBrick scan my Azure API Management instance?

Yes, middleBrick can scan APIs exposed through Azure API Management. It parses the XML policy definitions to identify improper data handling, tests the API endpoints through the management layer, and examines what data actually flows through. The scanner completes in 5-15 seconds without requiring credentials or configuration.

Excessive Data Exposure on Azure

How Excessive Data Exposure Manifests in Azure

Azure-Specific Detection

Azure-Specific Remediation

Related CWEs: propertyAuthorization

Frequently Asked Questions

Related Pages