Postgres: pruning-aware partition locking, rethought

April 5, 2026

In my last post I described a redesign of the pruning-aware partition locking patch that had been reverted from Postgres 18. The core idea was ExecutorPrep(), a new function factored out of InitPlan() that runs range table setup, permission checks, and initial pruning. GetCachedPlan() would call it during plan validation, lock only the surviving partitions, and pass the resulting EState back to the caller through a new struct called CachedPlanPrepData.

That design worked. It passed tests. But within a week of posting the patches, I scrapped it and started over. This post is about why, and what replaced it.

What felt wrong

The problem was CachedPlanPrepData itself. It was an out-parameter on GetCachedPlan() that carried executor state – an EState pointer, a ResourceOwner, a MemoryContext – from deep inside plan cache validation out to the caller, who then had to thread it through portals, SPI, or EXPLAIN to reach ExecutorStart(). If a caller forgot to deliver it, the executor would silently redo pruning, potentially computing different results and expecting locks on partitions that were never locked. I added Asserts to catch this in debug builds, but the failure mode in production was silent.

The deeper issue was that GetCachedPlan() was doing too much. It was fetching a plan, choosing between generic and custom, validating it, acquiring execution locks, and now also running executor logic to decide which locks to skip. Each of those responsibilities had its own failure modes and cleanup paths, and cramming them into one function made the interactions hard to reason about.

The multi-statement case crystallized this for me. Rule rewriting can expand a single statement into multiple PlannedStmts in a CachedPlan. PortalRunMulti() executes them sequentially with CommandCounterIncrement() between them, so later statements’ pruning expressions can see effects of earlier ones. Running ExecutorPrep() for all statements in one pass during GetCachedPlan() meant pruning for later statements happened before earlier ones had executed. I also hit a concrete crash: PortalRunMulti() calls MemoryContextDeleteChildren(portalContext) between statements, which destroyed EStates prepared for later statements. The fix was to guard multi-statement plans out entirely, which was correct, but it made me realize I was patching around a fundamental coupling that shouldn’t exist.

What Tom said in 2023

Tom Lane had actually suggested a different direction back in January 2023, in a message on the hackers list. He proposed getting rid of AcquireExecutorLocks() inside GetCachedPlan() entirely and pushing lock acquisition out to callers. He noted: “we’d be pushing the responsibility for looping back and re-planning out to fairly high-level calling code” and “we’d definitely be changing some fundamental APIs.”

The reverted commit tried to follow that spirit but moved locking into ExecutorStart(), which forced it to handle plan invalidation from inside the executor by mutating the CachedPlan in-place. That was the wrong place. The right answer, I eventually realized, was to move locking out to the callers instead, so the executor and plan cache never reach into each other.

The new design

In the current patch, GetCachedPlan() returns a valid plan without acquiring any execution locks. The caller is responsible for locking before execution.

A new exported function, AcquireExecutorLocks(), provides the conservative default. It locks all relations in the plan, checks whether the plan is still valid, and returns false if it was invalidated so the caller can release and retry with a fresh plan. This preserves the old behavior for callers that don’t need pruning-aware locking.

For portal-backed callers, PortalLockCachedPlan() in pquery.c wraps the lock-check-retry loop and handles the case where replanning changes the portal strategy. The CachedPlanSource is now stored in PortalData so the retry can call GetCachedPlan() without the caller having to thread the plan source through.

With lock acquisition now in the caller’s hands, the pruning-aware path becomes a natural extension rather than a special mode tunneled through an out-parameter. For portal-backed callers handling a single-statement reused generic plan, PortalLockCachedPlan() takes a different route: it creates a QueryDesc, calls ExecutorPrepAndLock(), and if the plan survives validation, passes the prepped QueryDesc to ExecutorStart().

ExecutorPrepAndLock() encapsulates the three-step sequence that was previously hidden inside AcquireExecutorLocksUnpruned() in plancache.c. First, lock unprunable relations from PlannedStmt.unprunableRelids. Then call ExecutorPrep() to run initial pruning and determine which partitions survive. Then lock only the survivors. Plan validity is checked after each step. If the plan is invalidated at any point, all acquired locks are released, ExecutorPrepCleanup() frees the orphaned EState, and the caller retries.

EXPLAIN EXECUTE also uses the prep path, so EXPLAIN (ANALYZE) on a prepared statement with partitions only locks the relevant partitions.

Non-portal call sites (_SPI_execute_plan, SQL functions) remain on the conservative path for now. _SPI_execute_plan requires care around snapshot setup, which happens after plan fetch rather than before. SQL functions have a structural issue: init_execution_state() fetches the plan while postquel_start() handles execution, with execution_state containers in between. The portal path and EXPLAIN EXECUTE cover the most common prepared-statement-with-partitions workloads; the remaining sites can be converted incrementally.

What changed in the patch structure

The series is now four patches. The first is the architectural shift: move execution lock acquisition out of GetCachedPlan(), add AcquireExecutorLocks() as the caller-facing function, add PortalLockCachedPlan() for portals, and convert all callers. No behavioral change.

The second refactors the executor’s initial pruning setup, simplifying unpruned relid tracking. The third introduces ExecutorPrep() and refactors ExecutorStart() to reuse its EState when one is provided.

The fourth is the actual optimization: ExecutorPrepAndLock(), the pruning-aware path in PortalLockCachedPlan(), the firstResultRels tracking for writable CTEs, and the regression tests. Multi-statement CachedPlans always use conservative locking, for the same reasons as before.

Where things stand

The patch is targeting Postgres 20. It has not had a formal review in any of its iterations, which is the main blocker. The core architectural question is whether separating lock acquisition from GetCachedPlan() is the right direction. I think it is. GetCachedPlan() was combining plan retrieval with execution setup in a way that made it impossible to do anything smarter with the locking step. The new design makes the two responsibilities explicit and gives callers the flexibility to handle each one appropriately.


© 2025 Amit Langote. Hosted by GitHub. Powered by Jekyll