Distributed Denial Of Service in Cockroachdb
How Distributed Denial of Service Manifests in CockroachDB
A distributed denial‑of‑service (DDoS) attack against CockroachDB typically aims to exhaust the cluster’s resources — connections, CPU, memory, or disk I/O — so that legitimate traffic is slowed or blocked. Because CockroachDB is a distributed SQL database, the attack surface includes the SQL gateway, the KV layer, and the replication subsystem.
- Connection exhaustion. An attacker opens thousands of short‑lived TCP connections to the SQL port (default 26257). Each connection consumes a goroutine and a memory buffer; when the per‑node limit
sql.max_connectionsis reached, new legitimate connections are rejected withpq: sorry, too many clients already. - Query‑CPU flood. The attacker sends expensive, unindexed scans or complex joins that force each node to use a large fraction of its CPU. CockroachDB distributes the query, but if the scan touches many ranges, the aggregate CPU usage can spike across the cluster, increasing latency for all users.
- Hot‑range write storm. By repeatedly inserting or updating rows that map to the same range (e.g., a monotonically increasing primary key), the attacker creates a write hotspot. Raft replicas for that range become a bottleneck, causing increased latency and possible transaction aborts due to contention.
- Disk‑I/O saturation. Issuing large
COPYorIMPORTstatements, or repeatedly reading large blobs, can fill the SSD bandwidth and cause background compaction queues to grow, slowing down reads and writes.
These patterns are distinct from generic network‑layer DDoS because they exploit CockroachDB’s internal resource quotas and data distribution mechanisms.
CockroachDB‑Specific Detection
Detecting a DDoS condition in CockroachDB relies on observing metrics that exceed baseline thresholds. Key signals include:
- Rapid rise in
sql.connectionsgauge (visible viaSHOW CLUSTER SETTING sql.metrics.connectionsor the DB Console). - Sudden increase in
sql.query_countorsql.latencyhistograms, especially forSELECTstatements with highrows_read. - Elevated
cpu_percenton multiple nodes, often accompanied by highkv.write_bytesorkv.read_bytes. - Growth in the
kv.range_lease_rebalancesmetric, indicating hot‑range contention. - Increasing
sql.transaction_abortrate due to serialization conflicts on hot ranges.
middleBrick includes a rate‑limiting check as one of its 12 parallel security scans. When you submit a CockroachDB endpoint URL, middleBrick probes the unauthenticated surface and reports if the endpoint lacks effective connection‑throttling or request‑rate limits. It does not block traffic; it simply flags the missing protection and provides remediation guidance.
To correlate middleBrick’s findings with internal metrics, you can run a quick health check:
# SQL: show current connection usage
SELECT node_id, connection_count FROM crdb_internal.node_connections;
# SQL: show recent query latency distribution
SELECT percentile, latency_ms FROM crdb_internal.sql_latency WHERE statement_type = 'SELECT' ORDER BY percentile;
If middleBrick reports a missing rate‑limit and the above queries show connection counts approaching sql.max_connections or latency spikes, you have strong evidence of a DDoS‑type condition.
CockroachDB‑Specific Remediation
Mitigation focuses on configuring CockroachDB’s built‑in limits and shaping the workload to avoid hotspots. All changes are made via SQL statements; no external agents are required.
1. Connection throttling
Set a conservative maximum number of client connections per node. This prevents connection‑exhaustion attacks.
ALTER CLUSTER SETTING sql.max_connections = 200;
Adjust the value based on your node’s RAM and expected concurrent workload.
2. Statement timeouts
Limit how long any single statement can run, curbing CPU‑intensive flood queries.
ALTER CLUSTER SETTING sql.statement_timeout = '30s';
Queries exceeding the timeout are cancelled with context deadline exceeded.
3. Memory limits per statement
Prevent a single query from consuming excessive RAM, which could trigger OOM kills.
ALTER CLUSTER SETTING sql.max_sql_memory = '256MiB';
4. Workload shaping to avoid hot ranges
Use monotonic‑timestamp or UUID primary keys to distribute writes evenly. If you must use a sequential key, interleave the table with a high‑cardinality parent or use hash‑sharded indexes.
-- Example: hash‑sharded index on a sequential ID
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
seq_id BIGINT NOT NULL,
customer_id UUID,
total DECIMAL,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX ON orders (seq_id) STORING (customer_id, total) WHERE true;
-- The index is automatically hash‑sharded because the leading column is not the primary key.
5. Zone configurations for load spreading
If you have multiple nodes across availability zones, ensure data is evenly distributed.
ALTER PARTITION OF INDEX orders@primary CONFIGURE ZONE USING
num_replicas = 3,
constraints = '[+region=us-east1, +region=us-west2]';
6. Client‑side back‑off and pooling
Even with server limits, clients should reuse connections and implement exponential back‑off on retryable errors.
// Go example using pgxpool
import (
"context"
"github.com/jackc/pgx/v5/pgxpool"
)
func NewPool() (*pgxpool.Pool, error) {
cfg, err := pgxpool.ParseConfig("postgres://user@host:26257/db?sslmode=disable")
if err != nil { return nil, err }
cfg.MaxConns = 20 // respect server limit
cfg.MinConns = 5
cfg.HealthCheckPeriod = time.Minute
cfg.AfterConnect = func(ctx context.Context, conn *pgx.Conn) error {
// enforce statement timeout per connection
_, err := conn.Exec(ctx, "SET sql.statement_timeout = '30s'")
return err
}
return pgxpool.NewConfig(cfg)
}
After applying these settings, monitor the same metrics mentioned in the Detection section. middleBrick will continue to report on missing rate‑limit protections, but the cluster will now resist connection‑exhaustion and CPU‑flood attempts.
Frequently Asked Questions
Does middleBrick stop a DDoS attack against my CockroachDB instance?
Which CockroachDB setting should I tune first to defend against connection‑exhaustion DDoS?
sql.max_connections. Setting it to a value that matches your node’s capacity (e.g., 200 connections per node) ensures that new legitimate connections are not rejected when an attacker tries to open many simultaneous sessions.