July 08, 2025
PostgreSQL’s planner is built around a set of data structures that help it safely and efficiently create query plans. The planner decides when filters can be safely moved, identifies expressions that must be equal, avoids unnecessary sorting or joins, and handles other correctness constraints like outer joins and null semantics.
Query planning is often thought of purely as choosing the cheapest execution path—selecting the fastest scans or joins based solely on cost estimates. But PostgreSQL’s planner must also ensure correctness, maintain SQL semantics, and guarantee safe transformations. This involves deciding:
These considerations require the planner to rely on structured metadata beyond simple numeric costs.
Rather than modifying expressions directly, PostgreSQL wraps WHERE and JOIN clauses in RestrictInfo
structs. These wrappers store metadata such as:
This makes clause movement safe and simplifies optimization.
Optimizations often depend on knowing two expressions will produce identical results at runtime. PostgreSQL manages this with EquivalenceClass
structs, representing sets of expressions known to be equal.
Built from mergejoinable equality clauses (e.g., t1.a = t2.a
), EquivalenceClasses allow:
t1.a = t2.a
and t2.a = 5
, derive t1.a = 5
)Each table or join has a corresponding RelOptInfo
struct, representing the planner’s internal view of a relation (distinct from parser-level constructs like RangeTblEntry
). It contains:
This structure enables flexible reasoning about scan and join strategies during planning.
Once a relation is represented by a RelOptInfo, the planner begins populating it with possible execution strategies.
Each possible plan for a relation is represented by a Path
. Examples include sequential scans, index scans, bitmap scans, and parallel scans. Each Path has:
Paths are accumulated in RelOptInfo.pathlist
. The planner discards suboptimal paths using add_path()
.
PathKeys represent the sort order of tuples produced by a Path. Built from EquivalenceClass members and sort operators, PathKeys enable:
By using symbolic representation, PathKeys let planners efficiently compare sort orders across candidate plans.
JoinInfo (joininfo_list
): Tracks join clauses that involve this base relation but refer to other rels as well. This allows the planner to account for join-related clauses even before the join itself is formed, and to consider whether they can be used in parameterized paths.
ParamPathInfo: Represents a scan path that depends on parameters (values) from an outer relation. Stores the estimated row count and clause selectivity assuming those parameters are supplied. Enables index nestloop and other dependent joins.
PlaceHolderVar: Represents an expression that must be evaluated at a specific join level, even if it’s referenced earlier or later in the plan. Used for things like lateral references or computed expressions that rely on outer joins.
All the structures discussed above support the planner’s bottom-up strategy for assembling a query plan.
PostgreSQL’s planner works bottom-up: it plans scan paths for base relations first, then builds join paths by combining those. This staged approach involves:
RestrictInfo
, EquivalenceClass
)At each stage, the planner generates multiple alternative paths and discards those that are clearly worse — for example, paths that are more expensive and provide no useful physical property like sort order or parallel safety. This happens through the add_path()
mechanism, which enforces a dominance check.
Aggressive pruning is essential to keep planning tractable. Without it, the number of paths and join combinations would grow too quickly to handle, especially in queries with many joins.
Some optimizers (like those based on the Cascades framework) represent each subquery or join input as a logical expression tree — a normalized operator-level AST. These optimizers memoize each such expression and apply transformation rules to generate equivalent alternatives (e.g., commutativity, associativity, filter pushdown). Each new form is stored and considered separately, and the optimizer defers choosing a physical implementation until all rewrites have been explored. In contrast, PostgreSQL keeps the logical query structure mostly fixed and explores only physical alternatives — represented as Path
nodes — for each part of the plan. It memoizes the cheapest paths for each relation or join combination, but aggressively prunes weaker ones early and avoids tracking multiple logically equivalent forms.
The planner’s internal structures evolved to balance correctness, extensibility, and planning efficiency. While not perfect or exhaustive, the combination of clause wrappers, symbolic equality tracking, and staged path generation supports a wide range of query patterns without relying on heavyweight memoization or rule systems. It’s a pragmatic design, and still improving.