2024Database Security

Simple Queries, Hidden Dangers

In database security assessments, the most dangerous queries are often the simplest. A SELECT with a WHERE clause on an indexed column should be fast. When it is not, the cause is frequently a mismatch between the query planner's statistical model of the data and the actual data distribution. On tables with hundreds of millions or billions of rows, this mismatch can transform a sub-millisecond index lookup into a full sequential scan that saturates I/O capacity for minutes. From a security perspective, any query that an authenticated user can trigger and that causes a full table scan on a large table is a potential denial-of-service vector.

The PostgreSQL query planner makes cost-based decisions using statistics stored in pg_statistic, which are updated by ANALYZE (or autovacuum). These statistics include the most common values and their frequencies, a histogram of value distribution, and the estimated number of distinct values. When these statistics are stale or when the data distribution is highly skewed, the planner can make catastrophically wrong estimates. A common scenario: a column that is 99.9% null, with an index on non-null values. If the planner estimates that the non-null fraction is higher than it actually is, it may choose a sequential scan over an index scan, reasoning that the index would return too many rows to be worthwhile. The resulting full table scan on a billion-row table is effectively a denial-of-service condition triggered by a legitimate query.

Diagnosing Without EXPLAIN ANALYZE

Running EXPLAIN ANALYZE on a suspect query in production is itself a risk: it executes the query, which means if the query is the problem, the diagnostic tool reproduces the problem. For security assessments on production databases, we use an alternative approach based on the statistics that PostgreSQL already maintains. The pg_stat_user_tables view provides seq_scan and seq_tup_read counters for each table. A table with a high seq_tup_read count relative to its row count is being sequentially scanned frequently. Combining this with pg_stat_statements (if enabled) allows identification of the specific queries responsible, including their mean and maximum execution times, without executing them again.

The pg_stat_user_tables.n_live_tup and pg_stat_user_tables.last_analyze columns reveal whether statistics are likely to be stale. A table where n_live_tup has changed significantly since last_analyze is a candidate for planner mis-estimation. We routinely flag tables where autovacuum's analyze threshold (default: 10% of rows changed) is too coarse for the data distribution. On a billion-row table, 10% represents 100 million row changes before autovacuum triggers a statistics refresh — a window during which the planner may be operating on severely outdated information.

Random Sampling and Its Pitfalls

A related finding from database assessments concerns sampling strategies. When analysts or application code need to draw a random sample from a large table, the choice of method has significant performance and correctness implications. PostgreSQL offers three common approaches, each with distinct tradeoffs. TABLESAMPLE SYSTEM (n) samples at the page level: it selects random 8KB pages and returns all rows from those pages. This is fast but biased — rows on the same page are correlated, and the sample size is approximate. TABLESAMPLE BERNOULLI (n) evaluates each row independently with the specified probability, producing a statistically uniform sample, but it must scan the entire table to do so. The naive approach, ORDER BY random() LIMIT n, is the worst of all options: it reads every row, generates a random value for each, sorts the entire result set, and returns the top N. On a billion-row table, this is indistinguishable from a denial-of-service attack.

For security assessments, the sampling method matters because it determines whether a legitimate analytical query can be weaponised. An application that exposes a "random sample" API endpoint backed by ORDER BY random() LIMIT gives any authenticated user the ability to trigger a full table sort on arbitrarily large tables. We recommend TABLESAMPLE SYSTEM for cases where speed matters and approximate uniformity is acceptable, and TABLESAMPLE BERNOULLI with a small percentage for cases requiring statistical rigour. In either case, the sampling method should be chosen deliberately by the database team, not left as an implementation detail in application code where it may default to the most expensive option.

The common thread across these findings is that PostgreSQL's query behavior is not always predictable from the SQL text alone. The same query can be fast or catastrophically slow depending on statistics freshness, data distribution, and planner heuristics. Security assessments of database-backed applications must evaluate not just what queries are possible, but how those queries will be executed under adversarial conditions.

← Back to Insights