Api Scraping in Axum
How API Scraping Manifests in Axum
API scraping is the systematic extraction of data from an API by iterating through predictable identifiers (like sequential integers) or by exploiting unrestricted list endpoints. In Axum applications, this vulnerability commonly stems from two patterns: (1) endpoints that accept user-controlled identifiers (e.g., path parameters) without verifying that the requester is authorized to access the specific resource, and (2) list endpoints that allow unrestricted pagination or lack rate limiting, enabling enumeration of all resources.
Axum's ergonomic extractors make it easy to capture path and query parameters, but developers sometimes forget to couple these with proper authorization. Consider this typical Axum handler for fetching a user by ID:
async fn get_user(Path(id): Path<u32>) -> impl IntoResponse {
let user = sqlx::query_as!(User, "SELECT * FROM users WHERE id = $1", id)
.fetch_one(&pool)
.await
.map_err(|_| (StatusCode::NOT_FOUND, "User not found"))?;
(StatusCode::OK, Json(user))
}This endpoint is publicly accessible (no authentication extractor) and uses a sequential integer ID. An attacker can script requests to /api/users/1, /api/users/2, etc., harvesting every user record. Even if the endpoint requires authentication, if it doesn't verify that the authenticated user owns the requested ID or has admin privileges, a logged-in attacker can still scrape other users' data by changing the ID parameter.
Similarly, list endpoints often expose too much data:
async fn list_orders(Query(params): Query<ListParams>) -> impl IntoResponse {
let limit = params.limit.unwrap_or(100);
let offset = params.page.unwrap_or(1).saturating_sub(1) * limit;
let orders = sqlx::query_as!(Order, "SELECT * FROM orders LIMIT $1 OFFSET $2", limit, offset)
.fetch_all(&pool)
.await?;
(StatusCode::OK, Json(orders))
}With no maximum page size and no rate limiting, an attacker can request /api/orders?limit=100&page=1, then page=2, and so on, eventually downloading all orders. If the response includes sensitive fields (e.g., customer PII, payment tokens), this becomes a critical data exposure.
These patterns are exacerbated by Axum's default behavior of passing extracted parameters directly to business logic without automatic authorization checks. The framework provides the building blocks (extractors, middleware) but leaves security entirely to the developer.
Axum-Specific Detection
Detecting API scraping vulnerabilities requires both static and dynamic analysis. In Axum codebases, look for:
- Handlers that accept
Path<T>orQueryparameters and use them directly in database queries without an authorization check (e.g., comparinguser.idto the pathid). - List endpoints that accept
limitandpage(oroffset) without enforcing a maximumlimitor without rate limiting middleware. - Routes that return entire database records (via
Json(user)) without filtering out sensitive fields.
Dynamic scanning with middleBrick automates detection of these issues. When you submit an Axum API endpoint to middleBrick, it performs a black-box scan that includes:
- BOLA/IDOR testing: For each endpoint that accepts an identifier (e.g.,
/api/users/:id), middleBrick sends requests with a sequence of IDs (1, 2, 3, ...) and analyzes the responses. If unauthenticated requests return valid data (HTTP 200 with non-empty body) for multiple IDs, it flags an unauthenticated IDOR vulnerability. - Rate limiting assessment: middleBrick issues a burst of requests (e.g., 100 requests in 10 seconds) to list and detail endpoints. If the server does not respond with HTTP 429 (Too Many Requests) or other throttling signals, the lack of rate limiting is reported.
- Data exposure scanning: The response bodies are scanned for patterns indicating sensitive data (e.g., email addresses, credit card numbers, API keys). If found, middleBrick reports potential data exposure.
For example, scanning an Axum app with a vulnerable /api/users/:id endpoint might yield a finding like:
| Check | Severity | Endpoint | Evidence |
|---|---|---|---|
| BOLA/IDOR | High | GET /api/users/{id} | Unauthenticated access to user records for IDs 1-10 |
middleBrick's CLI makes it easy to integrate into your development workflow:
middlebrick scan https://your-axum-app.com/apiThe scan completes in 5–15 seconds and returns a risk score (A–F) with actionable remediation steps tailored to your Axum stack.
Axum-Specific Remediation
Fixing API scraping in Axum involves implementing proper authorization, rate limiting, and data filtering. Here are concrete steps with code examples.
1. Enforce Authorization on Resource Endpoints
For endpoints that return a single resource by ID, ensure that the requester is authorized to access that specific resource. In Axum, you can create an extractor that loads the authenticated user (e.g., from a session or JWT) and then compare the user's ID or roles to the requested resource ID.
use axum::{
extract::{Path, State},
http::StatusCode,
response::IntoResponse,
};
use serde::Serialize;
#[derive(Serialize)]
struct User {
id: u32,
username: String,
email: String, // might be sensitive
}
// Assume we have an extractor that gives us the current user
async fn get_user(
user: AuthenticatedUser, // custom extractor that returns the logged-in user
Path(id): Path<u32>,
State(pool): State<DbPool>,
) -> impl IntoResponse {
// Authorization check: user can only access their own data unless admin
if user.id != id && !user.is_admin {
return Err((StatusCode::FORBIDDEN, "Not authorized"));
}
let user_record = sqlx::query_as!(
User,
"SELECT id, username, email FROM users WHERE id = $1",
id
)
.fetch_optional(&pool)
.await
.map_err(|_| (StatusCode::INTERNAL_SERVER_ERROR, "Database error"))?;
match user_record {
Some(user) => Ok((StatusCode::OK, Json(user))),
None => Err((StatusCode::NOT_FOUND, "User not found")),
}
}2. Implement Rate Limiting
Use middleware to limit the number of requests per IP address. Axum integrates with tower middleware, so you can add a rate limiter layer:
use tower::limit::rate::RateLimit;
use std::time::Duration;
let app = Router::new()
.route("/api/users/:id", get(get_user))
.route("/api/users", get(list_users))
.layer(RateLimit::new(100, Duration::from_secs(60))); // 100 requests per minute per IPFor more advanced rate limiting (e.g., per-user after authentication), you would need a custom middleware that extracts the user ID and applies limits accordingly.
3. Restrict List Endpoints
Prevent enumeration by capping the page size and using cursor-based pagination instead of offset-based. Also, filter out sensitive fields from the response.
#[derive(Deserialize)]
struct ListParams {
cursor: Option<String>, // base64-encoded cursor (last seen ID or timestamp)
limit: Option<usize>,
}
async fn list_users(
Query(params): Query<ListParams>,
State(pool): State<DbPool>,
) -> impl IntoResponse {
let limit = params.limit.unwrap_or(20).min(50); // enforce max 50 per page
let cursor = params.cursor.as_deref().and_then(decode_cursor); // decode cursor to last ID
let mut query = String::from("SELECT id, username FROM users"); // only non-sensitive fields
let mut args: Vec<&dyn sqlx::Encode<'_, sqlx::Postgres>> = Vec::new();
if let Some(last_id) = cursor {
query.push_str(" WHERE id > $1");
args.push(&last_id);
}
query.push_str(" ORDER BY id LIMIT $");
args.push(&(limit as i64 + 1)); // fetch one extra to see if there's a next page
// Build query dynamically (using sqlx::query_as with arguments)
// ... (omitted for brevity)
let users = sqlx::query_as!(User, &query, args).fetch_all(&pool).await?;
// If we fetched limit+1, then there's a next page; use the last ID as next cursor
let next_cursor = if users.len() > limit {
users.last().map(|u| encode_cursor(u.id))
} else {
None
};
let response = Json(json!({
"users": &users[..limit],
"next_cursor": next_cursor,
}));
(StatusCode::OK, response)
}Note: The above cursor implementation is simplified. In production, you'd want to use a more robust cursor (e.g., based on timestamp and ID) and ensure it's not guessable.
4. Use UUIDs for Public Identifiers
If an endpoint must be publicly accessible (e.g., a public profile page), use UUIDs instead of sequential integers. This makes enumeration infeasible because the keyspace is too large to brute-force. In your database, store a UUID column and use it in the route:
async fn get_public_profile(Path(uuid): Path<Uuid>) -> impl IntoResponse {
let profile = sqlx::query_as!(Profile, "SELECT id, username, bio FROM profiles WHERE uuid = $1", uuid)
.fetch_optional(&pool)
.await?;
match profile {
Some(p) => (StatusCode::OK, Json(p)),
None => (StatusCode::NOT_FOUND, "Profile not found"),
}
}Combine this with rate limiting to further reduce the risk of targeted guessing.
By applying these Axum-specific fixes, you can prevent attackers from scraping your API's data. Regularly scan your Axum APIs with middleBrick to catch any regressions — especially after adding new endpoints or changing existing ones.