Pii Leakage on Azure
How Pii Leakage Manifests in Azure
Personally identifiable information (PII) can leave an Azure‑hosted API through several common misconfigurations and coding patterns. When developers rely on Azure’s platform services without applying the principle of least privilege, the resulting endpoints may return more data than intended, enabling attackers to harvest names, email addresses, phone numbers, social‑security numbers, or health‑related data.
- Over‑permissive storage containers: An Azure Blob Storage account configured with public read access (or a SAS token granting
sp=ron the container level) allows anyone who guesses the container URL to enumerate blobs. If the blobs contain JSON files, CSV exports, or diagnostic logs that include PII, the data is immediately exposed. - Entity‑returning Azure Functions or App Services: A function that reads an Azure Table Storage entity or a Cosmos DB document and returns the raw entity to the caller often includes every property, even those marked as internal. For example, a function that retrieves a user profile may inadvertently expose the
PasswordHashorSSNfield because the entity model is serialized directly. - Diagnostic logging and tracing: Azure App Service diagnostics, Application Insights, or Azure Monitor can capture request bodies, headers, or exception details. If the logging level is set to
Verboseand the code logs the entire request object, PII submitted in a POST payload (e.g., a credit‑card number) becomes searchable in logs. - Improper use of Azure Key Vault references: When an ARM template or Bicep file references a Key Vault secret by using the
referencefunction without masking, the secret value can appear in the deployment output or in the template’sdebuglog, leaking credentials that later grant access to data stores containing PII. - API Management payload pass‑through: If an API Management policy is configured to forward the backend response unchanged (
<return-response>without a<set-body>filter), any over‑exposed backend data (including PII) is sent straight to the consumer.
These patterns map directly to OWASP API Security Top 10 2019 API3: Excessive Data Exposure. Real‑world incidents such as CVE‑2021‑26411 (a misconfigured Azure Storage account that allowed public read of container blobs containing user data) illustrate how a simple configuration drift can lead to large‑scale PII leakage.
Azure‑Specific Detection
Detecting PII leakage in Azure‑hosted APIs requires looking for both runtime responses and configuration drift. middleBrick performs unauthenticated black‑box scans that can surface these issues without needing credentials or agents.
What middleBrick looks for
- Response bodies that match common PII patterns (email addresses, phone numbers, US‑style SSNs, credit‑card numbers via Luhn check, or GDPR‑style personal identifiers).
- Headers that indicate overly permissive caching (
Cache-Control: public) on endpoints that return PII. - Publicly accessible Azure Storage blob or container URLs discovered through URL guessing or via the
x-ms-blob-typeheader in responses. - Error messages or stack traces that reveal internal identifiers (e.g., GUIDs that map to user IDs) when the API returns
500errors. - Presence of diagnostic endpoints such as
/diagnostics,/trace, or/logsthat leak request payloads.
Scanning example with the middleBrick CLI
# Install the CLI (npm)
npm i -g middlebrick
# Scan an Azure Function endpoint
middlebrick scan https://myfuncapp.azurewebsites.net/api/GetUserProfile
# Output JSON for CI integration
middlebrick scan https://myapi.azurewebsites.net/items --format json > scan-result.json
The scan returns a risk score (A–F) and a per‑category breakdown. If the Data Exposure category shows findings such as "PII detected in response body" or "Public storage container detected", the report includes the exact URL, the matched pattern, and a severity rating (usually high).
Because middleBrick does not require credentials, it can also be run against staging or production URLs in a GitHub Action:
name: API Security Scan
on: [push]
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: npm i -g middlebrick
- run: middlebrick scan https://staging-myapp.azurewebsites.net --fail-below B
The action will fail the build if the score drops below the threshold you set (e.g., B), giving you an early warning before code reaches production.
Azure‑Specific Remediation
Fixing PII leakage in Azure relies on applying the platform’s native security controls and adjusting application code to return only the data that is strictly necessary.
Storage hardening
- Set the Blob Storage account’s
publicNetworkAccesstoDisabledor ensure containers arePrivate. Use Azure PolicystorageAccountsAllowBlobPublicAccessto enforce this. - When a SAS token is required, generate it with the most limited permissions (
sp=ron a specific blob, not the container) and a short expiry (separameter). - Enable Azure Defender for Storage to receive alerts when a container becomes publicly accessible.
Function/App Service data filtering
- Never return raw entity models. Instead, map to a DTO that excludes sensitive fields.
- In .NET 6 Azure Functions:
using System.Collections.Generic;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Extensions.Http;
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.Logging;
public static class GetUserProfile
{
public record UserDto(string Id, string Name, string Email); // SSN omitted
[FunctionName("GetUserProfile")]
public static async Task Run(
[HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = "profile/{id}")] HttpRequest req,
string id,
ILogger log)
{
// Assume we fetch an entity from Table Storage
var entity = await TableService.GetEntityAsync<UserEntity>("Users", id);
if (entity == null) return new NotFoundResult();
var dto = new UserDto(entity.PartitionKey, entity.Name, entity.Email);
return new OkObjectResult(dto);
}
}
// Internal entity – never exposed
public class UserEntity
{
public string PartitionKey { get; set; }
public string RowKey { get; set; }
public string Name { get; set; }
public string Email { get; set; }
public string SSN { get; set; } // sensitive, not in DTO
}
The function now returns only Id, Name, and Email.
- In Node.js with the Azure SDK for Cosmos DB, request only the needed fields via a
SELECTprojection:
const { CosmosClient } = require("@azure/cosmos");
const client = new CosmosClient(process.env.COSMOS_CONNECTION_STRING);
const container = client.database("UsersDb").container("Profiles");
async function getUser(id) {
const query = "SELECT c.id, c.name, c.email FROM c WHERE c.id = @id";
const { resources } = await container.items
.query(query, [{ name: "@id", value: id }])
.fetchAll();
return resources[0]; // contains only id, name, email
}
Logging and diagnostics
- Configure Azure App Service diagnostic settings to send logs to a Log Analytics workspace with
RetentionInDaysset andCategorieslimited toAppServiceConsoleLogsandAppServiceHTTPLogs. DisableFailedRequestsTracingif it captures request bodies. - In Application Insights, set
SamplingPercentageto a low value and useTelemetryProcessorto drop telemetry that contains patterns matching PII (e.g., regex for SSN). - Review ARM/Bicep templates: replace any
reference(resourceId('Microsoft.KeyVault/vaults/secrets/', vaultName, secretName), '2019-09-01').valuewith a secure parameter that is markedsecureStringand never output in deployment logs.
API Management policies
- Use a
<set-body>policy to project a safe response:
<outbound>
<base />
<set-body>{
"id": $(context.Response.Body.As<JObject>()["id"]),
"name": $(context.Response.Body.As<JObject>()["name"]),
"email": $(context.Response.Body.As<JObject>()["email"])
}</set-body>
</outbound>
By combining these controls—private storage, data‑transfer‑object patterns, restrained logging, and API Management filtering—you eliminate the most common pathways through which PII leaks from Azure‑hosted APIs.
Related CWEs: dataExposure
| CWE ID | Name | Severity |
|---|---|---|
| CWE-200 | Exposure of Sensitive Information | HIGH |
| CWE-209 | Error Information Disclosure | MEDIUM |
| CWE-213 | Exposure of Sensitive Information Due to Incompatible Policies | HIGH |
| CWE-215 | Insertion of Sensitive Information Into Debugging Code | MEDIUM |
| CWE-312 | Cleartext Storage of Sensitive Information | HIGH |
| CWE-359 | Exposure of Private Personal Information (PII) | HIGH |
| CWE-522 | Insufficiently Protected Credentials | CRITICAL |
| CWE-532 | Insertion of Sensitive Information into Log File | MEDIUM |
| CWE-538 | Insertion of Sensitive Information into Externally-Accessible File | HIGH |
| CWE-540 | Inclusion of Sensitive Information in Source Code | HIGH |