Back to sandboxing
Database sandbox / May 19, 2026

Imladri isolated DB branch latency on GCP Local SSD, then cut the 5B DDL tail.

The prepared physical branch path moved from a small droplet to a GCP N2D host with striped Local SSD. The goal was not a bigger marketing number. It was to separate branch creation from first connection startup, steady query latency, psql process overhead, branch-write pressure, cleanup/refill behavior, and branchd-side session pooling.

Setup

The run used a verified 1B-row Postgres source fixture on a GCP Compute Engine N2D host with 8 vCPUs, 32 GiB RAM, and 750 GiB of striped Local SSD mounted as Btrfs. Branchd kept a resident prepared pool and served 100 concurrent branch requests through the same governed create, query, destroy, and source-mutation proof path.

Earlier GCP repeats showed 100-way branch creation was already stable, but the first query stayed in the 200ms+ range. This pass changed one variable at a time: no branch write, a warm second query on the same client, a psql runner comparison, and then a branchd-side resident session pool.

Result

Session pooling moved the tail down.

With the Node pg runner, prepared branch lease/create stayed below 200ms p95. Removing the write probe did not remove the first-query tail. But a second query on the same client dropped to 12.67ms p95, which isolates the remaining tail to first connection/query startup under burst, not the branch copy itself.

The next pass moved query execution into branchd-side resident sessions. In the clean read-only run, 100 concurrent branches completed in 216.44ms wall time with 82.3ms create p95 and 77.16ms first-query p95. A later sustained burst pass used a cheaper select 1 first-query probe, targeted the source-mutation check to the event table being written, and deferred Btrfs tombstone deletion out of the hot destroy path. That 5x100 run completed 500/500 branches with 95.41ms create p95, 33.36ms first-query p95, and 0 source mutations. Instrumenting the refill path then showed that Btrfs snapshots, Postgres starts, and session preconnects were competing inside the same warm loop. Phased warming fixed the contention and produced a new sustained run with 91.13ms create p95, 28.67ms first-query p95, 95.6ms write p95, and 3.93s refill p95.

The same host later ran the harder 5B DDL probe that had exposed the multi-second p95 tail. Normal event inserts stayed below 105ms p95, but branch-local DDL pushed write and cleanup into the one-to-two second range. The fix was not to hide the result. Imladri made per-branch Postgres memory/WAL/checkpoint settings configurable, restarted branchd with 64MB shared_buffers, and fixed the pool-refill timer so a restarted service always warms the prepared branch pool. The 5x100 repeat completed 500/500 DDL-heavy branches with 0 deadlocks, 0 source mutations, 361.87ms write p95, and 134.52ms destroy p95. A follow-up patch made pool accounting ignore stale entries from prior pool roots before refill. With refill/start/preconnect concurrency at 32, the write p95 moved again to 297.17ms, while the full 100-branch refill wait stayed around 8.3s.

RunSuccessWallCreate p50 / p95First query p50 / p95Second/write p50 / p95Source mutations
5B DDL tail + 64MB buffers + refill 32500/500621.51ms / 665.06ms80.04ms / 96.64ms162.45ms / 345.09ms201.71ms / 297.17ms0
5B DDL tail + 64MB branch buffers500/500620.33ms / 708.66ms79ms / 88.27ms174.32ms / 378.99ms197.14ms / 361.87ms0
phased pool warm + deferred delete500/500281.02ms / 326.71ms83.28ms / 91.13ms26.06ms / 28.67ms81.96ms / 95.6ms0
sustained branchd event insert + deferred delete500/500296.92ms / 308.41ms86.45ms / 95.41ms25.08ms / 33.36ms70.38ms / 105.4ms0
sustained branchd event insert + refill 32500/500312.77ms / 379.12ms97.39ms / 171.36ms26.35ms / 35.15ms81.96ms / 108.71ms0
branchd sessions + read only100/100216.44ms77.47ms / 82.3ms68.32ms / 77.16ms0ms / 0ms0
branchd sessions + event probe100/100423.54ms81.85ms / 99.92ms104.25ms / 117.43ms111.41ms / 170.86ms0
pg after ready sessions100/100288.86ms141.52ms / 149.36ms75.45ms / 98.58ms0ms / 0ms0
pg + event probe100/100526.11ms142.61ms / 149.14ms175.99ms / 242.55ms115.75ms / 151.45ms0
pg + no write100/100432.27ms145.85ms / 175.77ms195.5ms / 236.94ms0ms / 0ms0
pg + second select 1100/100420.35ms117.95ms / 128.67ms210.45ms / 247.88ms4ms / 12.67ms0
psql + no write100/1001038.84ms396.95ms / 775.77ms542.08ms / 827.58ms0ms / 0ms0
5B DDL tail fix

The bad p95 was branch-local DDL pressure, not source-copying.

This pass compared the ugly 5B burst against normal inserts, a DDL diagnostic, and the tuned DDL repeat. It is the cleanest explanation so far for why a 5B branch can create quickly but still show a slow p95 when the benchmark does 100 concurrent schema mutations inside the branches.

RunSuccessCreateFirst queryWrite end to endWrite serverDestroy
Original 5B DDL-heavy burst set500/50093.93ms / 123.54ms188.82ms / 341.36ms216.94ms / 2544.74ms154.43ms / 2536.99ms116.12ms / 2437.85ms
Normal event insert diagnostic100/10080.46ms p95242.71ms p95104.52ms p9575.26ms p9590.87ms p95
DDL diagnostic before branch-buffer tuning100/100245.03ms p95362.59ms p951038.97ms p95968.96ms p95918.12ms p95
DDL after 64MB branch buffers + refilled pool timer fix500/50079ms / 88.27ms174.32ms / 378.99ms197.14ms / 361.87ms169.75ms / 310.48ms84.76ms / 134.52ms
DDL after active-root pool isolation + refill concurrency 32500/50080.04ms / 96.64ms162.45ms / 345.09ms201.71ms / 297.17ms136.26ms / 213.57ms87.94ms / 177.19ms
What changed

The benchmark now separates four different costs.

DimensionEvidenceConclusion
Branch creationThe best sustained 5x100 branch run measured 83.28ms p50 / 91.13ms p95 create latency with 500/500 success.Prepared branch lease is no longer the dominant tail in the hot path.
First queryAfter changing the latency run to use select 1, targeted event-table source probes, and branchd-side resident sessions, first-query latency measured 26.06ms p50 / 28.67ms p95.The old 150ms+ tail was benchmark overhead, not branch query startup.
Branch writesThe phased sustained branch-local event insert run stayed clean and measured 81.96ms p50 / 95.6ms p95 for the write step.Write scheduling is no longer above the old event-probe tail in this benchmark shape.
Cleanup and refillDeferred Btrfs tombstones moved subvolume delete out of the destroy hot path. Instrumentation showed pipelined snapshots, Postgres starts, and preconnects were competing during refill; phased pool warming restored 100 ready branches in 3.82s p50 / 3.93s p95.The sustained refill target is now below 6s p95 on this GCP Local SSD VM.
DDL-heavy tailThe 5B DDL probe originally showed multi-second p95 write and destroy spikes. Raising per-branch Postgres shared buffers to 64MB cut the five-burst DDL write p95 to 361.87ms; isolating active pool roots and testing refill concurrency 32 moved the best write p95 to 297.17ms.The ugly tail was branch-local catalog/DDL pressure and pool scheduling, not source-row copying.
Limitations

This is better evidence, not the finish line.

The result is useful because it shows the refill bottleneck was a scheduling problem, not a hard storage ceiling. It does not claim that Imladri has matched full storage-branch vendors across every production shape. The honest next target is to repeat this phased refill policy on larger source fixtures and higher-core machines. The DDL-heavy path still exposes Btrfs snapshot creation as the next refill limiter.

Still one host shapeThe below-6s refill result is verified on one 8-vCPU GCP N2D host with striped Local SSD. It should be repeated on larger source fixtures and a dedicated higher-core machine before making a broad infrastructure claim.
Phased refill is a branchd policyThe win came from scheduling snapshots before Postgres starts and preconnects. That policy needs to remain enabled in production branchd deployments, not just in the benchmark harness.
5B DDL refill remains visibleThe normal sustained event-write path restored a ready pool in 3.93s p95, but the 5B DDL repeat still waited about 8.3s for full 100-branch refill. Btrfs snapshot creation is now the next measured refill limiter.
Targeted source probeFor event-insert mode, the benchmark now checks the source event table instead of scanning unrelated legacy probe tables. That is the correct proof surface for this workload, but each write mode still needs its matching source-mutation check.
DDL is not the normal write pathThe DDL probe intentionally creates branch-local schema objects under 100-way pressure. Normal branch-local event inserts are faster; production policy should separate data writes from schema-mutation workloads.
Evidence

The artifact bundles the clean run and all three isolation runs.

The public JSON includes the original event-probe run, no-write pg run, warm-second-query pg run, psql diagnostic run, the branchd resident-session read-only run, the branchd resident-session event-write run, deferred-delete sustained burst runs, cleanup/refill status, source mutation checks, and deadlock counts.