Training Data Extraction in Actix

How Training Data Extraction Manifests in Actix

Training data extraction attacks target endpoints that inadvertently expose sensitive information from machine learning model training datasets. In Actix web applications, this commonly occurs through misconfigured debug endpoints, overly permissive file serving routes, or improper handling of environment variables that leak paths to training data artifacts.

One specific pattern involves Actix's static file serving when combined with development-mode configuration. For example, serving the target/debug or target/release directories via actix-files can expose .pt, .pth, .npy, or checkpoint files containing model weights or training data shards. Another vector is through Actix's web::Data extractor when application state holds references to training data paths or database connections used during model training, which may be exposed via introspection endpoints.

Consider an Actix service that mounts a debug router in development:

use actix_web::{web, App, HttpServer};
use actix_files::Files;

async fn debug_info() -> impl actix_web::Responder {
    // Accidentally exposes training data path
    format!("Training data located at: {}", std::env::var("TRAINING_DATA_PATH").unwrap_or_default())
}

#[actix_web::main]
async fn main() -> std::io::Result<()> {
    HttpServer::new(|| {
        App::new()
            .service(web::scope("/debug").route("/info", web::get().to(debug_info)))
            .service(Files::new("/assets", "./target/debug").show_files_listing()) // Risky in dev
    })
    .bind(("127.0.0.1", 8080))?
    .run()
    .await
}

Here, the /debug/info endpoint leaks the TRAINING_DATA_PATH environment variable, and serving ./target/debug as static files risks exposing model.safetensors or dataset.csv if the binary was built in the same directory. Attackers can traverse these paths to download training data, which may contain PII, proprietary labels, or sensitive source material used in model development.

Actix-Specific Detection

Detecting training data exposure in Actix requires scanning for both information disclosure vectors and file system access patterns. middleBrick identifies these risks through unauthenticated black-box checks that probe for common leakage points without needing source code or configuration.

The scanner tests for:

Environment variable exposure via debug endpoints (e.g., /debug/env, /config, /info)
Static file serving of sensitive directories (./target, ./data, ./models)
Directory listing enabled on routes serving build artifacts
File download endpoints lacking proper path validation (potential path traversal to ../../training_data)
Responses containing file paths, checksums, or metadata indicative of ML artifacts (e.g., strings like .ckpt, epoch_, optimizer)

For instance, if an Actix app serves ./target/debug via actix-files with show_files_listing() enabled, middleBrick will detect a 200 OK response listing files like model-epoch-10.pt or training_log.json and flag it as a data exposure finding. Similarly, if a GET /debug/config endpoint returns JSON with "training_data_bucket": "s3://my-company/ml-datasets", it triggers a finding under the Data Exposure check.

These findings are presented in the middleBrick dashboard with severity, location, and remediation guidance — such as disabling file listings in production, moving static assets outside build directories, and auditing debug routes for environment variable leaks.

Teams can integrate this detection into CI using the middleBrick GitHub Action:

    Scan your API now Free API security scan 
     Related Pages
Actix API SecuritySecure your Actix APIs with practical hardening tips. Learn about common vulnerabilities, authentication best practices,Training Data Extraction AttackTraining data extraction attacks target ML APIs to recover sensitive information from model training data. Learn detectiCWE-1104 in ActixDiscover how CWE-1104 uncontrolled resource consumption vulnerabilities manifest in Actix-web applications and learn ActCWE-116 in ActixHow CWE-116 vulnerabilities manifest in Actix Web applications through response splitting, XSS, and path traversal. DeteCWE-113 in ActixLearn how Cwe 113 (CRLF injection) affects Actix Web applications, with specific attack patterns, detection methods usinCWE-1039 in ActixLearn how CWE-1039 (ReDoS) appears in Actix-web applications, how to detect it with middleBrick, and how to fix it usingHipaa for ActixmiddleBrick detects HIPAA risks in Actix APIs by scanning for unauthorized PHI exposure, improper authentication, and daGdpr for ActixLearn how GDPR risks appear in Actix Web APIs, how middleBrick detects data exposure and logging issues, and how to fix Iso 27001 for ActixIso 27001 for ActixCis for ActixLearn how Confused Identity Syndrome (Cis) manifests in Actix-web applications, with Actix-specific detection techniquesActix in E CommerceLearn how e commerce vulnerabilities manifest in Actix Web frameworks, including BOLA, IDOR, and payment webhook risks tTraining Data Extraction in Actix (Rust)How training data extraction manifests in Actix with Rust, and Rust-specific fixes: validation, sanitized errors, and re