Setup

The run used a verified 1B-row Postgres source fixture on a GCP Compute Engine N2D host with 8 vCPUs, 32 GiB RAM, and 750 GiB of striped Local SSD mounted as Btrfs. Branchd kept a resident prepared pool and served 100 concurrent branch requests through the same governed create, query, destroy, and source-mutation proof path.

Earlier GCP repeats showed 100-way branch creation was already stable, but the first query stayed in the 200ms+ range. This pass changed one variable at a time: no branch write, a warm second query on the same client, a psql runner comparison, and then a branchd-side resident session pool.

Result

Session pooling moved the tail down.

With the Node pg runner, prepared branch lease/create stayed below 200ms p95. Removing the write probe did not remove the first-query tail. But a second query on the same client dropped to 12.67ms p95, which isolates the remaining tail to first connection/query startup under burst, not the branch copy itself.

The next pass moved query execution into branchd-side resident sessions. In the clean read-only run, 100 concurrent branches completed in 216.44ms wall time with 82.3ms create p95 and 77.16ms first-query p95. A later sustained burst pass used a cheaper select 1 first-query probe, targeted the source-mutation check to the event table being written, and deferred Btrfs tombstone deletion out of the hot destroy path. That 5x100 run completed 500/500 branches with 95.41ms create p95, 33.36ms first-query p95, and 0 source mutations. Instrumenting the refill path then showed that Btrfs snapshots, Postgres starts, and session preconnects were competing inside the same warm loop. Phased warming fixed the contention and produced a new sustained run with 91.13ms create p95, 28.67ms first-query p95, 95.6ms write p95, and 3.93s refill p95.

The same host later ran the harder 5B DDL probe that had exposed the multi-second p95 tail. Normal event inserts stayed below 105ms p95, but branch-local DDL pushed write and cleanup into the one-to-two second range. The fix was not to hide the result. Imladri made per-branch Postgres memory/WAL/checkpoint settings configurable, restarted branchd with 64MB shared_buffers, and fixed the pool-refill timer so a restarted service always warms the prepared branch pool. The 5x100 repeat completed 500/500 DDL-heavy branches with 0 deadlocks, 0 source mutations, 361.87ms write p95, and 134.52ms destroy p95. A follow-up patch made pool accounting ignore stale entries from prior pool roots before refill. With refill/start/preconnect concurrency at 32, the write p95 moved again to 297.17ms, while the full 100-branch refill wait stayed around 8.3s.

Run	Success	Wall	Create p50 / p95	First query p50 / p95	Second/write p50 / p95
5B DDL tail + 64MB buffers + refill 32	500/500	621.51ms / 665.06ms	80.04ms / 96.64ms	162.45ms / 345.09ms	201.71ms / 297.17ms
5B DDL tail + 64MB branch buffers	500/500	620.33ms / 708.66ms	79ms / 88.27ms	174.32ms / 378.99ms	197.14ms / 361.87ms
phased pool warm + deferred delete	500/500	281.02ms / 326.71ms	83.28ms / 91.13ms	26.06ms / 28.67ms	81.96ms / 95.6ms
sustained branchd event insert + deferred delete	500/500	296.92ms / 308.41ms	86.45ms / 95.41ms	25.08ms / 33.36ms	70.38ms / 105.4ms
sustained branchd event insert + refill 32	500/500	312.77ms / 379.12ms	97.39ms / 171.36ms	26.35ms / 35.15ms	81.96ms / 108.71ms
branchd sessions + read only	100/100	216.44ms	77.47ms / 82.3ms	68.32ms / 77.16ms	0ms / 0ms
branchd sessions + event probe	100/100	423.54ms	81.85ms / 99.92ms	104.25ms / 117.43ms	111.41ms / 170.86ms
pg after ready sessions	100/100	288.86ms	141.52ms / 149.36ms	75.45ms / 98.58ms	0ms / 0ms
pg + event probe	100/100	526.11ms	142.61ms / 149.14ms	175.99ms / 242.55ms	115.75ms / 151.45ms
pg + no write	100/100	432.27ms	145.85ms / 175.77ms	195.5ms / 236.94ms	0ms / 0ms
pg + second select 1	100/100	420.35ms	117.95ms / 128.67ms	210.45ms / 247.88ms	4ms / 12.67ms
psql + no write	100/100	1038.84ms	396.95ms / 775.77ms	542.08ms / 827.58ms	0ms / 0ms

5B DDL tail fix

The bad p95 was branch-local DDL pressure, not source-copying.

This pass compared the ugly 5B burst against normal inserts, a DDL diagnostic, and the tuned DDL repeat. It is the cleanest explanation so far for why a 5B branch can create quickly but still show a slow p95 when the benchmark does 100 concurrent schema mutations inside the branches.

Run	Success	Create	First query	Write end to end	Write server	Destroy
Original 5B DDL-heavy burst set	500/500	93.93ms / 123.54ms	188.82ms / 341.36ms	216.94ms / 2544.74ms	154.43ms / 2536.99ms	116.12ms / 2437.85ms
Normal event insert diagnostic	100/100	80.46ms p95	242.71ms p95	104.52ms p95	75.26ms p95	90.87ms p95
DDL diagnostic before branch-buffer tuning	100/100	245.03ms p95	362.59ms p95	1038.97ms p95	968.96ms p95	918.12ms p95
DDL after 64MB branch buffers + refilled pool timer fix	500/500	79ms / 88.27ms	174.32ms / 378.99ms	197.14ms / 361.87ms	169.75ms / 310.48ms	84.76ms / 134.52ms
DDL after active-root pool isolation + refill concurrency 32	500/500	80.04ms / 96.64ms	162.45ms / 345.09ms	201.71ms / 297.17ms	136.26ms / 213.57ms	87.94ms / 177.19ms

What changed

The benchmark now separates four different costs.

Dimension	Evidence	Conclusion
Branch creation	The best sustained 5x100 branch run measured 83.28ms p50 / 91.13ms p95 create latency with 500/500 success.	Prepared branch lease is no longer the dominant tail in the hot path.
First query	After changing the latency run to use select 1, targeted event-table source probes, and branchd-side resident sessions, first-query latency measured 26.06ms p50 / 28.67ms p95.	The old 150ms+ tail was benchmark overhead, not branch query startup.
Branch writes	The phased sustained branch-local event insert run stayed clean and measured 81.96ms p50 / 95.6ms p95 for the write step.	Write scheduling is no longer above the old event-probe tail in this benchmark shape.
Cleanup and refill	Deferred Btrfs tombstones moved subvolume delete out of the destroy hot path. Instrumentation showed pipelined snapshots, Postgres starts, and preconnects were competing during refill; phased pool warming restored 100 ready branches in 3.82s p50 / 3.93s p95.	The sustained refill target is now below 6s p95 on this GCP Local SSD VM.
DDL-heavy tail	The 5B DDL probe originally showed multi-second p95 write and destroy spikes. Raising per-branch Postgres shared buffers to 64MB cut the five-burst DDL write p95 to 361.87ms; isolating active pool roots and testing refill concurrency 32 moved the best write p95 to 297.17ms.	The ugly tail was branch-local catalog/DDL pressure and pool scheduling, not source-row copying.

Limitations

This is better evidence, not the finish line.

The result is useful because it shows the refill bottleneck was a scheduling problem, not a hard storage ceiling. It does not claim that Imladri has matched full storage-branch vendors across every production shape. The honest next target is to repeat this phased refill policy on larger source fixtures and higher-core machines. The DDL-heavy path still exposes Btrfs snapshot creation as the next refill limiter.

Still one host shape	The below-6s refill result is verified on one 8-vCPU GCP N2D host with striped Local SSD. It should be repeated on larger source fixtures and a dedicated higher-core machine before making a broad infrastructure claim.
Phased refill is a branchd policy	The win came from scheduling snapshots before Postgres starts and preconnects. That policy needs to remain enabled in production branchd deployments, not just in the benchmark harness.
5B DDL refill remains visible	The normal sustained event-write path restored a ready pool in 3.93s p95, but the 5B DDL repeat still waited about 8.3s for full 100-branch refill. Btrfs snapshot creation is now the next measured refill limiter.
Targeted source probe	For event-insert mode, the benchmark now checks the source event table instead of scanning unrelated legacy probe tables. That is the correct proof surface for this workload, but each write mode still needs its matching source-mutation check.
DDL is not the normal write path	The DDL probe intentionally creates branch-local schema objects under 100-way pressure. Normal branch-local event inserts are faster; production policy should separate data writes from schema-mutation workloads.

Evidence

The artifact bundles the clean run and all three isolation runs.

The public JSON includes the original event-probe run, no-write pg run, warm-second-query pg run, psql diagnostic run, the branchd resident-session read-only run, the branchd resident-session event-write run, deferred-delete sustained burst runs, cleanup/refill status, source mutation checks, and deadlock counts.

Open GCP Local SSD JSON Open 5B DDL-tail JSON Open 5B refill32 JSON

Imladri isolated DB branch latency on GCP Local SSD, then cut the 5B DDL tail.

Seven chapters. One continuous data-boundary story.

Session pooling moved the tail down.

The bad p95 was branch-local DDL pressure, not source-copying.

The benchmark now separates four different costs.

This is better evidence, not the finish line.

The artifact bundles the clean run and all three isolation runs.