HIGH CWE-502 Input Validation

CWE-502 in APIs

What is CWE-502?

CWE-502: Deserialization of Untrusted Data describes a vulnerability where an application deserializes data from an untrusted source without proper validation or integrity checks. This weakness allows attackers to craft malicious serialized objects that, when deserialized, can execute arbitrary code, cause denial of service, or bypass authentication mechanisms.

The core issue stems from the fact that deserialization processes often create objects in memory without proper input validation. If an attacker can control the serialized data, they can potentially instantiate classes with dangerous side effects, trigger constructor logic that performs unintended operations, or cause memory corruption.

Common attack vectors include:

Remote procedure calls accepting serialized objects
Session storage using serialized objects
API endpoints accepting binary or JSON payloads for object reconstruction
Database fields containing serialized data

The severity of CWE-502 varies by language and framework. In Java, Python, and PHP, deserialization vulnerabilities can lead to remote code execution. In JavaScript/TypeScript environments, while direct code execution is less common, attackers can still trigger application crashes, data corruption, or logic bypasses.

CWE-502 in API Contexts

APIs face unique deserialization risks because they inherently accept data from untrusted sources. Several API-specific scenarios commonly lead to CWE-502 vulnerabilities:

Binary Protocol Endpoints - Some APIs use binary protocols like Protocol Buffers, MessagePack, or custom binary formats. While these formats offer performance benefits, they can introduce deserialization risks if the parser contains vulnerabilities or if the API accepts arbitrary binary data without validation.

Object Reconstruction Endpoints - APIs that accept serialized objects for state reconstruction are particularly vulnerable. For example, an endpoint that accepts a "UserSession" object might deserialize it directly, allowing an attacker to craft a malicious session object that executes code during construction.

GraphQL APIs - GraphQL implementations sometimes use JSON serialization for query execution contexts. An attacker could potentially manipulate the serialized execution context to alter query behavior or trigger unintended operations.

Event-Driven Architectures - APIs processing serialized events from message queues or event streams may deserialize untrusted data. If event producers are compromised or if the serialization format allows arbitrary object types, this creates a deserialization attack surface.

Configuration-as-Code APIs - APIs that accept serialized configuration objects (YAML, JSON, or custom formats) can be vulnerable if the deserialization process instantiates classes based on configuration data. This pattern appears in infrastructure-as-code tools and configuration management APIs.

Real-world examples include CVE-2017-5941 (Node.js deserialization leading to remote code execution) and various Java Spring Boot deserialization vulnerabilities that allow attackers to execute arbitrary code by sending crafted serialized objects.

Detection

Detecting CWE-502 requires both static analysis and dynamic testing approaches. Here's how to identify deserialization vulnerabilities in your APIs:

Static Code Analysis - Review your codebase for deserialization functions and their input sources. Look for patterns like:

ObjectInputStream.readObject() // Java
pickle.loads() // Python
unserialize() // PHP
JSON.parse() with reviver functions // JavaScript

Identify whether these functions receive data from external sources (HTTP requests, message queues, files) without validation.

Dynamic Scanning - Use automated tools to probe your API endpoints for deserialization vulnerabilities. middleBrick's black-box scanning approach tests for deserialization weaknesses by:

Sending serialized payloads to endpoints that accept JSON, XML, or binary data
Checking for deserialization-related error messages that reveal framework details
Testing for unsafe deserialization by sending crafted payloads that trigger different code paths
Analyzing API responses for signs of successful deserialization attacks

middleBrick Scanning - middleBrick's 12-point security scan includes deserialization testing as part of its Input Validation category. The scanner:

Automatically identifies endpoints that might perform deserialization
Tests for common deserialization frameworks and their known vulnerabilities
Checks for unsafe object reconstruction patterns
Provides a security score (0-100) with specific findings about deserialization risks
Offers remediation guidance with severity levels (Critical/High/Medium/Low)

Manual Testing Techniques - Security researchers use these techniques to test for deserialization vulnerabilities:

1. Modify JSON payloads to include unexpected object types
2. Send serialized objects with modified constructor parameters
3. Test with known deserialization gadget chains (if framework is identified)
4. Check for differences in error messages between valid and invalid input

Framework-Specific Testing - Different frameworks have different deserialization behaviors:

Java Spring Boot: Test for Spring-powered deserialization gadgets
Node.js: Test for Node.js vm module usage in deserialization
PHP: Test for unserialize() with user-controlled data
Python: Test for pickle.loads() with external data

Remediation

Fixing CWE-502 requires a defense-in-depth approach. Here are proven remediation strategies with code examples:

1. Input Validation and Whitelisting - Never deserialize arbitrary objects. Validate input type and structure before deserialization:

// Insecure - accepts any serialized object
Object deserialized = serializer.deserialize(input);

// Secure - validates input format first
public UserSession deserializeSession(String input) {
    if (!isValidSessionFormat(input)) {
        throw new InvalidFormatException();
    }
    return sessionSerializer.deserialize(input);
}

private boolean isValidSessionFormat(String input) {
    // Check JSON structure, required fields, and types
    return input.matches('\{"userId":\d+,"timestamp":\d+,"data":\{.*\}\}')
        && !containsUnsafeCharacters(input);
}

2. Use Safe Serialization Formats - Prefer formats that don't allow arbitrary object instantiation:

// Instead of Java serialization, use JSON with strict typing
public UserSession fromJson(String json) {
    JsonObject obj = JsonParser.parseString(json).getAsJsonObject();
    if (!obj.has("userId") || !obj.has("timestamp")) {
        throw new InvalidFormatException();
    }
    return new UserSession(
        obj.get("userId").getAsInt(),
        obj.get("timestamp").getAsLong(),
        obj.get("data").getAsJsonObject()
    );
}

3. Implement Deserialization Filters - In Java 9+, use deserialization filtering to restrict allowed classes:

import java.io.ObjectInputFilter;

public class SafeDeserialization {
    private static final ObjectInputFilter filter = filter -> {
        if (filter.classDesc().getClassName().contains("com.example")) {
            return ObjectInputFilter.Status.UNDECIDED;
        }
        return ObjectInputFilter.Status.REJECTED;
    };

    public static Object safeDeserialize(byte[] data) throws IOException, ClassNotFoundException {
        ObjectInputStream ois = new ObjectInputStreamWithFilter(
            new ByteArrayInputStream(data), filter
        );
        return ois.readObject();
    }
}

4. Use Language-Specific Safe Alternatives - Avoid dangerous functions:

// Python - avoid pickle, use safer alternatives
import json
from dataclasses import dataclass

@dataclass
class UserSession:
    user_id: int
    timestamp: int
    data: dict

def safe_deserialize(json_str: str) -> UserSession:
    try:
        data = json.loads(json_str)
        return UserSession(**data)
    except (json.JSONDecodeError, TypeError, ValueError) as e:
        raise ValueError("Invalid session data") from e

5. Implement Integrity Checks - Add cryptographic signatures to serialized data:

public class SignedSerializer {
    private final Mac mac;

    public SignedSerializer(SecretKey key) {
        this.mac = Mac.getInstance("HmacSHA256");
        this.mac.init(key);
    }

    public String serializeWithSignature(Object obj) throws Exception {
        byte[] data = serialize(obj);
        byte[] signature = mac.doFinal(data);
        return Base64.getEncoder().encodeToString(data) + ":" + 
               Base64.getEncoder().encodeToString(signature);
    }

    public Object deserializeWithVerification(String signedData) throws Exception {
        String[] parts = signedData.split(":");
        if (parts.length != 2) throw new InvalidFormatException();
        
        byte[] data = Base64.getDecoder().decode(parts[0]);
        byte[] signature = Base64.getDecoder().decode(parts[1]);
        
        if (!MessageDigest.isEqual(mac.doFinal(data), signature)) {
            throw new InvalidSignatureException();
        }
        
        return deserialize(data);
    }
}

6. API Gateway Protection - Add deserialization protection at the API gateway level:

// API Gateway middleware for deserialization protection
public class DeserializationProtectionFilter implements Filter {
    @Override
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
        
        // Check content type and size limits
        if (request.getContentType().contains("application/octet-stream")) {
            if (request.getContentLength() > MAX_BINARY_SIZE) {
                response.sendError(413, "Payload too large");
                return;
            }
        }
        
        // Validate JSON structure before processing
        if (request.getContentType().contains("application/json")) {
            try {
                String json = request.getReader().lines().collect(Collectors.joining());
                JsonParser.parseString(json); // Basic validation
            } catch (JsonParseException e) {
                response.sendError(400, "Invalid JSON format");
                return;
            }
        }
        
        chain.doFilter(request, response);
    }
}

7. Runtime Monitoring - Implement monitoring for deserialization-related anomalies:

// Monitor for unusual deserialization patterns
public class DeserializationMonitor {
    private static final Set<String> ALLOWED_CLASSES = Set.of(
        "com.example.UserSession",
        "com.example.ApiRequest"
    
    public void monitorDeserialization(String className) {
        if (!ALLOWED_CLASSES.contains(className)) {
            log.warning("Deserialization of unexpected class: " + className);
            alertSecurityTeam(className);
        }
    }
}

Frequently Asked Questions

How does middleBrick detect deserialization vulnerabilities?

middleBrick uses black-box scanning to test API endpoints for deserialization weaknesses. The scanner sends crafted payloads to endpoints that accept serialized data, checks for deserialization-related error messages, and analyzes responses for signs of successful deserialization attacks. It tests across 12 security categories including input validation, and provides a security score with specific findings about deserialization risks.

Can deserialization vulnerabilities be completely eliminated?

While you can significantly reduce risk, complete elimination requires careful design. The safest approach is to avoid deserialization of untrusted data entirely by using safe formats like JSON with strict validation, implementing integrity checks with cryptographic signatures, and using deserialization filters to restrict allowed classes. However, legacy systems and certain use cases may still require deserialization, making defense-in-depth (validation + filtering + monitoring) essential.