Back to sandboxing
Database sandbox / May 21, 2026

Imladri added cold branch pools for governed Postgres sandboxes.

After the GCP Local SSD branchd pass cut the 5B DDL tail, the next question was refill strategy. This run added a second cold reserve pool underneath the hot branch pool, then tested whether cold Btrfs snapshots could refill 5B-governed branches without touching the source.

What changed

The earlier branchd pool kept hot branch roots ready for immediate lease. That removed source copying from the request path, but a full 100-branch refill could still show up as an 8-second wait in the harder DDL benchmark. The new cold pool adds another layer: branchd prepares cold snapshots first, promotes them into hot branches, then starts Postgres and preconnects only when hot capacity needs to be restored.

The intent is simple: keep the expensive source snapshot work ahead of demand, keep the user-facing branch lease governed, and keep proof checks attached to create, write, cleanup, and source-isolation events.

Result

The cold reserve worked, then exposed the real host ceiling.

The direct proof run used 20 hot branches and 20 cold reserve branches against the verified 5B source. It completed 40/40 governed branches with 0 deadlocks and 0 source mutations. Create latency measured 81.86ms p50 / 82.39ms p95. First query measured 49.56ms p50 / 57.69ms p95, and branch-local writes measured 15.05ms p50 / 57.41ms p95.

The next two runs intentionally pushed harder: 100 concurrent DDL branches over a hot pool with either 100 or 400 cold reserve branches behind it. Both completed 500/500 branch operations with 0 deadlocks and 0 source mutations. The p95 create and write times stayed in the hundreds of milliseconds, but ready-pool refill waited around 10.8s to 12.9s p95.

That is the honest limitation. Cold snapshots remove a class of source-copy work, but this budget host still has to juggle hundreds of branch Postgres processes, DDL writes, background refill, cold refill, and cleanup on 8 vCPUs.

RunSuccessReady wait p50 / p95Create p50 / p95First query p50 / p95Write p50 / p95Source mutations
20 hot / 20 cold event-insert proof40/4075.3ms / 4243.38ms81.86ms / 82.39ms49.56ms / 57.69ms15.05ms / 57.41ms0
100 hot / 100 cold DDL reserve500/50012831.45ms / 12863.87ms240.11ms / 282.23ms192.07ms / 433.39ms194.32ms / 450.31ms0
100 hot / 400 cold delayed DDL reserve500/50010764.19ms / 12962.28ms305.51ms / 437.05ms176.56ms / 585.29ms210.54ms / 622.89ms0
Interpretation

This shifted the bottleneck to scheduling.

DimensionEvidenceConclusion
Cold reserve pathBranchd can promote a cold Btrfs snapshot into the hot pool without re-copying the 5B source. The proof run kept cloneMs=0 and snapshotMs=0 during cold promotion.The next pool can be prepared before a user request arrives.
Governed proof pathThe 20 hot / 20 cold event-insert run completed 40/40 branches with source mutation checks, deadlock checks, branch-local write checks, and cleanup checks.The cold-pool path still preserves Imladri proof semantics.
DDL stress pathThe 100-way DDL runs completed 500/500 branches with 0 deadlocks and 0 source mutations, but ready wait stayed around 10.8s to 12.9s p95.The bottleneck moved from source copying to host scheduling and process pressure.
Budget host ceilingThe current machine is an 8-vCPU, 32 GiB GCP N2D host with striped Local SSD. It is good enough to prove correctness, but not the final performance envelope.A 16 to 32 vCPU Local SSD host is the correct next measurement when budget allows.
Limitation

This is not the final storage-engine number.

The cold pool proves the design direction, not the final infrastructure limit. The current host was chosen because it fit the budget: 8 vCPUs, 32 GiB RAM, and striped Local SSD. It is enough to test correctness and isolate the next bottleneck, but a serious vendor-grade comparison needs the same code on a higher-core storage host.

Not a broad infrastructure claimThis result proves the cold reserve design on one budget GCP host. It does not claim universal sub-six-second cloning across arbitrary production databases.
Active branch count still mattersHundreds of resident branch Postgres processes compete with hot refill, cold refill, cleanup, and DDL writes on an 8-vCPU machine.
DDL is intentionally harshThe DDL benchmark mutates branch-local schema under 100-way pressure. Normal event inserts are faster, but schema-heavy agent work needs separate policy and scheduling.
Higher-core host needed laterThe honest next pass is the same branchd configuration on a higher-core storage host. We kept this note because current budget blocks that upgrade today.
Evidence

The public artifacts include all three cold-pool runs.

The JSON artifacts include success counts, p50/p95 timings, pool readiness, cold-pool readiness, deadlock counts, source mutation checks, cleanup state, and branchd probe state after drain.