Postgres: planner internals (data structures and flow)

July 08, 2025

PostgreSQL’s planner is built around a set of data structures that help it safely and efficiently create query plans. The planner decides when filters can be safely moved, identifies expressions that must be equal, avoids unnecessary sorting or joins, and handles other correctness constraints like outer joins and null semantics.

Planning is more than costing

Query planning is often thought of purely as choosing the cheapest execution path—selecting the fastest scans or joins based solely on cost estimates. But PostgreSQL’s planner must also ensure correctness, maintain SQL semantics, and guarantee safe transformations. This involves deciding:

When and where clauses can be safely evaluated
Which expressions must be equal at runtime
What sort orders are possible
How to handle outer joins, null semantics, and lateral dependencies correctly

These considerations require the planner to rely on structured metadata beyond simple numeric costs.

RestrictInfo: metadata for qualifiers

Rather than modifying expressions directly, PostgreSQL wraps WHERE and JOIN clauses in RestrictInfo structs. These wrappers store metadata such as:

Whether a clause can be pushed down
Whether evaluation should be delayed by an outer join
Whether it’s a join clause (references multiple relations)
Whether it’s a pseudoconstant (not dependent on any table data)
Which relations it references (using bitsets called relids)

This makes clause movement safe and simplifies optimization.

EquivalenceClass: managing equalities

Optimizations often depend on knowing two expressions will produce identical results at runtime. PostgreSQL manages this with EquivalenceClass structs, representing sets of expressions known to be equal.

Built from mergejoinable equality clauses (e.g., t1.a = t2.a), EquivalenceClasses allow:

Inference of implied equalities (e.g., from t1.a = t2.a and t2.a = 5, derive t1.a = 5)
Use of join conditions to produce index scans
Determination of sort orders suitable for merge joins

RelOptInfo: planner’s relation representation

Each table or join has a corresponding RelOptInfo struct, representing the planner’s internal view of a relation (distinct from parser-level constructs like RangeTblEntry). It contains:

Set of relids the relation covers
Estimated row count and row width
Applicable RestrictInfos
Candidate Paths (possible plans)
Join conditions (via joininfo_list)

This structure enables flexible reasoning about scan and join strategies during planning.

Once a relation is represented by a RelOptInfo, the planner begins populating it with possible execution strategies.

Path: representing candidate plans

Each possible plan for a relation is represented by a Path. Examples include sequential scans, index scans, bitmap scans, and parallel scans. Each Path has:

Cost estimates (startup and total)
Relids indicating the relations it covers
PathKeys for sorting information
Flags for physical properties (e.g., parallel safety)

Paths are accumulated in RelOptInfo.pathlist. The planner discards suboptimal paths using add_path().

PathKeys: symbolic sort orders

PathKeys represent the sort order of tuples produced by a Path. Built from EquivalenceClass members and sort operators, PathKeys enable:

Elimination of redundant sorts
Merge joins without explicit sorting
Efficient satisfaction of ORDER BY and DISTINCT clauses

By using symbolic representation, PathKeys let planners efficiently compare sort orders across candidate plans.

Additional structures: JoinInfo, ParamPathInfo, and PlaceHolderVar

JoinInfo (joininfo_list): Tracks join clauses that involve this base relation but refer to other rels as well. This allows the planner to account for join-related clauses even before the join itself is formed, and to consider whether they can be used in parameterized paths.
ParamPathInfo: Represents a scan path that depends on parameters (values) from an outer relation. Stores the estimated row count and clause selectivity assuming those parameters are supplied. Enables index nestloop and other dependent joins.
PlaceHolderVar: Represents an expression that must be evaluated at a specific join level, even if it’s referenced earlier or later in the plan. Used for things like lateral references or computed expressions that rely on outer joins.

All the structures discussed above support the planner’s bottom-up strategy for assembling a query plan.

Planning in stages

PostgreSQL’s planner works bottom-up: it plans scan paths for base relations first, then builds join paths by combining those. This staged approach involves:

Propagating clause metadata upward (RestrictInfo, EquivalenceClass)
Allowing parameterized paths for joins
Choosing join strategies and sort orders based on available paths

At each stage, the planner generates multiple alternative paths and discards those that are clearly worse — for example, paths that are more expensive and provide no useful physical property like sort order or parallel safety. This happens through the add_path() mechanism, which enforces a dominance check.

Aggressive pruning is essential to keep planning tractable. Without it, the number of paths and join combinations would grow too quickly to handle, especially in queries with many joins.

Some optimizers (like those based on the Cascades framework) represent each subquery or join input as a logical expression tree — a normalized operator-level AST. These optimizers memoize each such expression and apply transformation rules to generate equivalent alternatives (e.g., commutativity, associativity, filter pushdown). Each new form is stored and considered separately, and the optimizer defers choosing a physical implementation until all rewrites have been explored. In contrast, PostgreSQL keeps the logical query structure mostly fixed and explores only physical alternatives — represented as Path nodes — for each part of the plan. It memoizes the cheapest paths for each relation or join combination, but aggressively prunes weaker ones early and avoids tracking multiple logically equivalent forms.

Closing thoughts

The planner’s internal structures evolved to balance correctness, extensibility, and planning efficiency. While not perfect or exhaustive, the combination of clause wrappers, symbolic equality tracking, and staged path generation supports a wide range of query patterns without relying on heavyweight memoization or rule systems. It’s a pragmatic design, and still improving.