The run used a verified 1B-row Postgres source fixture on a GCP Compute Engine N2D host with 8 vCPUs, 32 GiB RAM, and 750 GiB of striped Local SSD mounted as Btrfs. Branchd kept a resident prepared pool and served 100 concurrent branch requests through the same governed create, query, destroy, and source-mutation proof path.
Earlier GCP repeats showed 100-way branch creation was already stable, but the first query stayed in the 200ms+ range. This pass changed one variable at a time: no branch write, a warm second query on the same client, a psql runner comparison, and then a branchd-side resident session pool.
Session pooling moved the tail down.
With the Node pg runner, prepared branch lease/create stayed below 200ms p95. Removing the write probe did not remove the first-query tail. But a second query on the same client dropped to 12.67ms p95, which isolates the remaining tail to first connection/query startup under burst, not the branch copy itself.
The next pass moved query execution into branchd-side resident sessions. In the clean read-only run, 100 concurrent branches completed in 216.44ms wall time with 82.3ms create p95 and 77.16ms first-query p95. A later sustained burst pass used a cheaper select 1 first-query probe, targeted the source-mutation check to the event table being written, and deferred Btrfs tombstone deletion out of the hot destroy path. That 5x100 run completed 500/500 branches with 95.41ms create p95, 33.36ms first-query p95, and 0 source mutations. Instrumenting the refill path then showed that Btrfs snapshots, Postgres starts, and session preconnects were competing inside the same warm loop. Phased warming fixed the contention and produced a new sustained run with 91.13ms create p95, 28.67ms first-query p95, 95.6ms write p95, and 3.93s refill p95.
The same host later ran the harder 5B DDL probe that had exposed the multi-second p95 tail. Normal event inserts stayed below 105ms p95, but branch-local DDL pushed write and cleanup into the one-to-two second range. The fix was not to hide the result. Imladri made per-branch Postgres memory/WAL/checkpoint settings configurable, restarted branchd with 64MB shared_buffers, and fixed the pool-refill timer so a restarted service always warms the prepared branch pool. The 5x100 repeat completed 500/500 DDL-heavy branches with 0 deadlocks, 0 source mutations, 361.87ms write p95, and 134.52ms destroy p95. A follow-up patch made pool accounting ignore stale entries from prior pool roots before refill. With refill/start/preconnect concurrency at 32, the write p95 moved again to 297.17ms, while the full 100-branch refill wait stayed around 8.3s.
| Run | Success | Wall | Create p50 / p95 | First query p50 / p95 | Second/write p50 / p95 | Source mutations |
|---|---|---|---|---|---|---|
| 5B DDL tail + 64MB buffers + refill 32 | 500/500 | 621.51ms / 665.06ms | 80.04ms / 96.64ms | 162.45ms / 345.09ms | 201.71ms / 297.17ms | 0 |
| 5B DDL tail + 64MB branch buffers | 500/500 | 620.33ms / 708.66ms | 79ms / 88.27ms | 174.32ms / 378.99ms | 197.14ms / 361.87ms | 0 |
| phased pool warm + deferred delete | 500/500 | 281.02ms / 326.71ms | 83.28ms / 91.13ms | 26.06ms / 28.67ms | 81.96ms / 95.6ms | 0 |
| sustained branchd event insert + deferred delete | 500/500 | 296.92ms / 308.41ms | 86.45ms / 95.41ms | 25.08ms / 33.36ms | 70.38ms / 105.4ms | 0 |
| sustained branchd event insert + refill 32 | 500/500 | 312.77ms / 379.12ms | 97.39ms / 171.36ms | 26.35ms / 35.15ms | 81.96ms / 108.71ms | 0 |
| branchd sessions + read only | 100/100 | 216.44ms | 77.47ms / 82.3ms | 68.32ms / 77.16ms | 0ms / 0ms | 0 |
| branchd sessions + event probe | 100/100 | 423.54ms | 81.85ms / 99.92ms | 104.25ms / 117.43ms | 111.41ms / 170.86ms | 0 |
| pg after ready sessions | 100/100 | 288.86ms | 141.52ms / 149.36ms | 75.45ms / 98.58ms | 0ms / 0ms | 0 |
| pg + event probe | 100/100 | 526.11ms | 142.61ms / 149.14ms | 175.99ms / 242.55ms | 115.75ms / 151.45ms | 0 |
| pg + no write | 100/100 | 432.27ms | 145.85ms / 175.77ms | 195.5ms / 236.94ms | 0ms / 0ms | 0 |
| pg + second select 1 | 100/100 | 420.35ms | 117.95ms / 128.67ms | 210.45ms / 247.88ms | 4ms / 12.67ms | 0 |
| psql + no write | 100/100 | 1038.84ms | 396.95ms / 775.77ms | 542.08ms / 827.58ms | 0ms / 0ms | 0 |
The bad p95 was branch-local DDL pressure, not source-copying.
This pass compared the ugly 5B burst against normal inserts, a DDL diagnostic, and the tuned DDL repeat. It is the cleanest explanation so far for why a 5B branch can create quickly but still show a slow p95 when the benchmark does 100 concurrent schema mutations inside the branches.
| Run | Success | Create | First query | Write end to end | Write server | Destroy |
|---|---|---|---|---|---|---|
| Original 5B DDL-heavy burst set | 500/500 | 93.93ms / 123.54ms | 188.82ms / 341.36ms | 216.94ms / 2544.74ms | 154.43ms / 2536.99ms | 116.12ms / 2437.85ms |
| Normal event insert diagnostic | 100/100 | 80.46ms p95 | 242.71ms p95 | 104.52ms p95 | 75.26ms p95 | 90.87ms p95 |
| DDL diagnostic before branch-buffer tuning | 100/100 | 245.03ms p95 | 362.59ms p95 | 1038.97ms p95 | 968.96ms p95 | 918.12ms p95 |
| DDL after 64MB branch buffers + refilled pool timer fix | 500/500 | 79ms / 88.27ms | 174.32ms / 378.99ms | 197.14ms / 361.87ms | 169.75ms / 310.48ms | 84.76ms / 134.52ms |
| DDL after active-root pool isolation + refill concurrency 32 | 500/500 | 80.04ms / 96.64ms | 162.45ms / 345.09ms | 201.71ms / 297.17ms | 136.26ms / 213.57ms | 87.94ms / 177.19ms |
The benchmark now separates four different costs.
| Dimension | Evidence | Conclusion |
|---|---|---|
| Branch creation | The best sustained 5x100 branch run measured 83.28ms p50 / 91.13ms p95 create latency with 500/500 success. | Prepared branch lease is no longer the dominant tail in the hot path. |
| First query | After changing the latency run to use select 1, targeted event-table source probes, and branchd-side resident sessions, first-query latency measured 26.06ms p50 / 28.67ms p95. | The old 150ms+ tail was benchmark overhead, not branch query startup. |
| Branch writes | The phased sustained branch-local event insert run stayed clean and measured 81.96ms p50 / 95.6ms p95 for the write step. | Write scheduling is no longer above the old event-probe tail in this benchmark shape. |
| Cleanup and refill | Deferred Btrfs tombstones moved subvolume delete out of the destroy hot path. Instrumentation showed pipelined snapshots, Postgres starts, and preconnects were competing during refill; phased pool warming restored 100 ready branches in 3.82s p50 / 3.93s p95. | The sustained refill target is now below 6s p95 on this GCP Local SSD VM. |
| DDL-heavy tail | The 5B DDL probe originally showed multi-second p95 write and destroy spikes. Raising per-branch Postgres shared buffers to 64MB cut the five-burst DDL write p95 to 361.87ms; isolating active pool roots and testing refill concurrency 32 moved the best write p95 to 297.17ms. | The ugly tail was branch-local catalog/DDL pressure and pool scheduling, not source-row copying. |
This is better evidence, not the finish line.
The result is useful because it shows the refill bottleneck was a scheduling problem, not a hard storage ceiling. It does not claim that Imladri has matched full storage-branch vendors across every production shape. The honest next target is to repeat this phased refill policy on larger source fixtures and higher-core machines. The DDL-heavy path still exposes Btrfs snapshot creation as the next refill limiter.
| Still one host shape | The below-6s refill result is verified on one 8-vCPU GCP N2D host with striped Local SSD. It should be repeated on larger source fixtures and a dedicated higher-core machine before making a broad infrastructure claim. |
|---|---|
| Phased refill is a branchd policy | The win came from scheduling snapshots before Postgres starts and preconnects. That policy needs to remain enabled in production branchd deployments, not just in the benchmark harness. |
| 5B DDL refill remains visible | The normal sustained event-write path restored a ready pool in 3.93s p95, but the 5B DDL repeat still waited about 8.3s for full 100-branch refill. Btrfs snapshot creation is now the next measured refill limiter. |
| Targeted source probe | For event-insert mode, the benchmark now checks the source event table instead of scanning unrelated legacy probe tables. That is the correct proof surface for this workload, but each write mode still needs its matching source-mutation check. |
| DDL is not the normal write path | The DDL probe intentionally creates branch-local schema objects under 100-way pressure. Normal branch-local event inserts are faster; production policy should separate data writes from schema-mutation workloads. |
The artifact bundles the clean run and all three isolation runs.
The public JSON includes the original event-probe run, no-write pg run, warm-second-query pg run, psql diagnostic run, the branchd resident-session read-only run, the branchd resident-session event-write run, deferred-delete sustained burst runs, cleanup/refill status, source mutation checks, and deadlock counts.
