Timing Attack in Cassandra
How Timing Attack Manifests in Cassandra
Timing attacks exploit variations in response time to infer sensitive information, such as valid usernames, token values, or data existence. In Cassandra, these vulnerabilities arise from non-constant-time operations at both the application and database layers. Because Cassandra is a distributed, eventually consistent NoSQL database often used for high-throughput applications, developers may inadvertently introduce timing side-channels when constructing queries or processing results.
1. String Concatenation in CQL Queries: The most common pattern is building CQL queries via string concatenation with user input. Cassandra's query engine processes queries differently based on the existence of rows or the validity of partition keys. An attacker can measure response time differences between queries that match a row versus those that do not. For example:
String userId = request.getParameter("id");
String cql = "SELECT * FROM users WHERE user_id = '" + userId + "';";
ResultSet rs = session.execute(cql);If userId is valid, Cassandra may perform a faster lookup (direct partition key access) versus a full scan or empty result set for invalid IDs, leading to measurable timing differences (even milliseconds) that leak existence information.
2. Token-Aware Routing and Partition Key Validation: Cassandra drivers use token-aware routing to send queries directly to the replica owning a partition key. When a partition key is invalid (e.g., not in the keyspace), the coordinator node may still need to consult multiple replicas or the system_schema tables to determine the token range, introducing a delay. Attackers can probe for valid partition keys by timing responses to crafted queries.
3. Secondary Index Queries: Queries using secondary indexes (CREATE INDEX) are executed as scatter-gather operations across nodes. A query matching an indexed value may return faster than one that doesn't, especially if the index is not uniformly distributed. This can reveal information about data distribution and existence.
4. Authentication and AuthenticationHandler Implementations: Custom authentication handlers (e.g., implementing IAuthenticator) that perform non-constant-time string comparisons on passwords or tokens can leak information via timing. Even Cassandra's built-in PasswordAuthenticator uses constant-time comparison, but custom implementations may not.
5. Materialized Views and Lightweight Transactions (LWT): LWT (IF NOT EXISTS, IF conditions) involve a paxos protocol phase that is inherently slower than regular writes. An attacker can detect the presence of a row by timing differences between conditional and unconditional inserts/updates. Similarly, querying a materialized view that must synchronize with base tables may have variable latency based on data existence.
Cassandra-Specific Detection
Detecting timing vulnerabilities in Cassandra requires analyzing both the application code that constructs CQL queries and the runtime behavior of the Cassandra cluster. middleBrick performs black-box timing analysis by submitting carefully crafted requests to the API endpoint and measuring response times across multiple runs, correlating timing deltas with potential data existence or validity.
Detection Patterns:
- Parameter Blinding: The scanner sends the same request with valid and invalid parameter values (e.g., existing vs. non-existing user IDs) and statistically analyzes response time distributions. A significant difference (e.g., >10ms p-value <0.05) indicates a potential timing leak.
- Query Structure Analysis: middleBrick parses the API's observed traffic (via its black-box approach) to identify query patterns. If it detects that user-supplied values are directly embedded in CQL strings (rather than using bound parameters), it flags this as a high-risk pattern for timing attacks.
- Endpoint Behavior Profiling: For endpoints that return different HTTP status codes or response sizes based on data existence (e.g., 200 with data vs. 404), middleBrick correlates these with timing to see if the status code itself leaks information via timing side-channels.
Example Scan with middleBrick CLI:
middlebrick scan https://api.example.com/users/{id} --timing-analysisThe --timing-analysis flag (hypothetical for illustration; actual flags may vary) enables extended timing probes. The resulting report includes a "Timing Attack" finding under the Input Validation or Authentication category, with details like:
| Finding | Severity | Details |
|---|---|---|
| Timing difference detected on user lookup endpoint | High | Valid user IDs return ~45ms faster than invalid ones (p=0.001). Likely due to non-constant-time query execution. |
Note: middleBrick does not require credentials or agents; it tests the unauthenticated attack surface. For authenticated endpoints, you would need to provide session tokens (if the scanner supports authenticated scans in your plan). The detection is purely external, measuring network-observable timing differences.
Cassandra-Specific Remediation
Remediation focuses on eliminating timing differences at the application and database interaction layer. The primary defense is constant-time operations and prepared statements.
1. Use Prepared Statements Exclusively: Never concatenate user input into CQL strings. Use bound parameters in the Java driver (or other language drivers). This ensures the query plan is identical regardless of parameter values, and the driver handles proper escaping.
// UNSAFE (vulnerable)
String cql = "SELECT * FROM users WHERE user_id = '" + userId + "';";
session.execute(cql);
// SAFE (constant-time)
PreparedStatement ps = session.prepare("SELECT * FROM users WHERE user_id = ?");
BoundStatement bs = ps.bind(userId);
session.execute(bs);Prepared statements cache the query plan on the server, so execution time depends only on data retrieval, not on parsing or planning differences caused by varying user_id values.
2. Constant-Time Comparisons in Application Logic: When comparing tokens, hashes, or other secrets (e.g., in custom authentication), use constant-time functions. In Java, use MessageDigest.isEqual() or libraries like org.apache.commons.codec.binary.Hex with careful comparison loops.
// UNSAFE (vulnerable)
if (storedToken.equals(userSuppliedToken)) { /* ... */ }
// SAFE (constant-time)
if (MessageDigest.isEqual(storedToken.getBytes(), userSuppliedToken.getBytes())) { /* ... */ }3. Normalize Response Times: Introduce artificial, random delays for all requests to an endpoint to mask inherent timing differences. This is a mitigation of last resort due to performance impact and is less reliable than fixing the root cause.
// Example: Add random delay (50-150ms) after processing
Random rand = new Random();
int delay = 50 + rand.nextInt(100);
Thread.sleep(delay);4. Avoid Data-Dependent Branches in Application Code: Ensure that code paths after a database query do not vary based on the existence of data. For example, do not log different messages or perform additional expensive operations only when a row is found.
5. Cassandra Configuration: While Cassandra itself does not provide built-in constant-time query execution, ensure that the read_request_timeout_in_ms and range_request_timeout_in_ms are configured appropriately to avoid timeouts that could leak information. Also, consider enabling slow_query_log_timeout_in_ms to monitor for unusually fast/slow queries that might indicate probing.
6. Schema Design: Avoid using secondary indexes for sensitive lookups where possible. Denormalize data or use materialized views with caution, as they can introduce timing variability. Partition keys should be designed to distribute queries evenly.
Testing Remediation: After applying fixes, re-scan with middleBrick. The timing-based finding should disappear. Additionally, conduct manual timing tests using tools like time or custom scripts to verify no statistically significant difference exists between valid and invalid inputs over many iterations.