The previous article closed the FK behavior gap. This one is about production ergonomics: if 100 agents need governed database branches at once, the system should not build every branch schema from scratch on the critical path. The latest run added source schema fingerprints, disabled benchmark refill debt, and kept the hot path out of Postgres catalog checks.
The prepared pool does not weaken the proof model. It still leases a governed COW branch, records the source metadata, executes branch-local writes, cleans up, and verifies that the source relation stayed unchanged.
The 100-way FK create p95 moved to 53.34ms.
The same ten-table FK fixture was tested across three implementation shapes. Eager materialization was correct but expensive. Targeted lazy materialization avoided unnecessary table work. The prepared pool moved most remaining clone work before the request. The final row is the latest clean run: refill disabled for benchmark hygiene, schema fingerprints cached, zero deadlocks, and zero source mutations.
| Mode | Success | FK create p50 / p95 | Valid write p50 / p95 | Source mutations |
|---|---|---|---|---|
| Eager FK materialization | 100 / 100 | 9769.45ms / 29179.01ms | 3921.18ms / 11530.44ms | 0 |
| Targeted lazy materialization | 100 / 100 | 1261.2ms / 3447.57ms | 5002.81ms / 7566.71ms | 0 |
| Prepared COW pool | 100 / 100 | 403.91ms / 710.04ms | 725.35ms / 922.46ms | 0 |
| Prepared pool + schema cache | 100 / 100 | 49.4ms / 53.34ms | 578.37ms / 1048.83ms | 0 |
The expensive clone moved out of the hot path.
The final run hit the prepared pool for every concurrent FK branch. The remaining branch-create path binds the leased branch, confirms the source schema fingerprint, and records the proof packet.
| Phase | p50 / p95 | Max | Meaning |
|---|---|---|---|
| Prepared pool hit | 1 / 1 | 1 / 1 | All 100 FK branches leased prebuilt schemas. |
| Clone schema build | 1 / 1 | 1 / 1 | Clone build stayed off the request path. |
| Branch DDL attach | 101.56ms / 178.88ms | 547.51ms max | Measured as the underlying attach work while leased branch create stayed at 49.4ms p50. |
| Source metadata | 1.32ms / 1.65ms | 2.66ms max | Source schema fingerprints were served from the shared cache. |
| Storage branch total | 48.17ms / 51.78ms | 52.31ms max | End-to-end branch creation under 100-way FK contention. |
The speedup did not remove the FK guardrails.
The prepared branch still enforced the same branch-local FK behavior as the correctness article. The benchmark completed 100 simultaneous FK branches, allowed valid branch writes, rejected orphan writes, applied update/delete cascades, and left the source unchanged. Cascade p95 moved from 3235.59ms to 493.36ms, and cleanup p95 moved from 3052.3ms to 249.34ms.
| Check | Result | Source state |
|---|---|---|
| Concurrent FK branches | 100 / 100 succeeded | 0 deadlocks |
| Orphan write | rejected in branch | source untouched |
| Restrict update/delete | rejected in branch | source untouched |
| Cascade update/delete | applied branch-locally | source untouched |
| Set null/default | applied branch-locally | source untouched |
| Deferred / MATCH FULL / self-ref | passed | source untouched |
The 5B-source write bottleneck had two layers.
The first bulk-write probe did not reveal source mutation. It revealed a performance bug: source expression indexes existed, but the branch overlay did not have matching lookup indexes. As the overlay grew, branch-local uniqueness checks scanned more branch rows. After adding overlay lookup indexes for source unique and expression indexes, the next bottleneck was clearer: large INSERT ... SELECT workloads were still moving through the row-level COW view trigger. The current fix adds a set-based lazy-COW bulk insert path and reruns 1M and 10M branch-local writes against the same 5B-source fixture.
| 1M write | 12,877.31ms |
|---|---|
| 1M throughput | 77,656 rows/sec |
| 10M write | 138,403.08ms |
| 10M throughput | 72,253 rows/sec |
| 10M overlay rows | 10,000,000 |
| 10M cleanup | 39.61ms |
| Proof | 7/7 |
| Source mutations | 0 |
Fast branching and heavy branch writes are different claims.
The sub-100ms branch numbers in this article are about governed branch creation over approved tables. The 12.88s and 138.4s numbers are heavy branch-local writes after the branch exists. They do not disprove fast clone work, but they also do not make Imladri an Ardent- or Neon-class full-database clone engine by themselves. The new backend seam for that next layer is external_snapshot_command: Glasshouse can now delegate branch create/destroy to a storage provider while keeping Imladri policy, transaction, cleanup, and proof around the branch lifecycle. A follow-up self-hosted Btrfs droplet 100-sample repeat reached 1106.76ms p50 / 1666.56ms p95 branch create on the existing shared runtime droplet. Moving the same 10M-row loopback proof to a dedicated 4 vCPU / 8GB droplet improved that to 699.54ms p50 / 1150.54ms p95, with 0 source mutations.
| Layer | What it does | Claim status |
|---|---|---|
| Prepared COW schema | Governed Postgres branch over approved tables with FK, trigger, RLS, write-control, cleanup, and proof checks. | Measured here: 100-way branch creation and branch-local write behavior. |
| External snapshot command | New backend seam that delegates create/destroy to a storage or page-branch provisioner and captures the returned branch connection string. | Follow-up proofs: dedicated loopback measured 699.54ms p50 / 1150.54ms p95; the real attached-volume run measured 799.86ms p50 / 997ms p95. Both used 10M-row fixtures and had 0 source mutations. |
| Bulk branch writes | The 1M/10M measurements are branch-local INSERT ... SELECT workloads after a branch exists. | They are not the same metric as full database clone time. |
The provider coordination bug is fixed; prepared physical branches cut the tail.
The first 25-way physical branch run exposed a real provider race: concurrent creates could reserve the same state window, fail, and leave branch subvolumes behind. The provider now reserves ports and state under an atomic lock, writes state atomically, and destroys by deterministic branch path if state is missing. After that patch, both the shared runtime droplet and the dedicated DB-sandbox droplet completed 25/25 concurrent branches with 0 deadlocks and 0 source mutations. The dedicated host reduced the 25-way create p95 from 60488.22ms to 10748.29ms. It is still loopback Btrfs, not a final attached-volume benchmark, but it proves the high tail was mostly host/contention pressure rather than a mutation safety failure.
| Host | Run | Success | Create p50 / p95 | Integrity |
|---|---|---|---|---|
| Shared runtime loopback | 5-way | 5/5 | 4659.37ms / 5186.03ms | 0 deadlocks / 0 source mutations |
| Shared runtime loopback | 10-way | 10/10 | 8725.63ms / 10340.99ms | 0 deadlocks / 0 source mutations |
| Shared runtime loopback | 25-way | 25/25 | 43903.14ms / 60488.22ms | 0 deadlocks / 0 source mutations |
| Dedicated droplet loopback | 5-way | 5/5 | 1123.65ms / 1270.88ms | 0 deadlocks / 0 source mutations |
| Dedicated droplet loopback | 10-way | 10/10 | 2211.99ms / 2600.4ms | 0 deadlocks / 0 source mutations |
| Dedicated droplet loopback | 25-way | 25/25 | 7195.72ms / 10748.29ms | 0 deadlocks / 0 source mutations |
The next pass warmed physical branches before the request path and leased them under the same create/destroy contract. That removes checkpoint, snapshot, and Postgres start from the hot path. On the dedicated loopback host, 20 concurrent warmed physical branches stayed under six seconds p95 while keeping 0 deadlocks and 0 source mutations. At 25-way, the host still crossed six seconds p95, so 25-way is a correctness result rather than the speed envelope.
| Prepared pool run | Success | Create p50 / p95 | Checkpoint / start on request | Integrity |
|---|---|---|---|---|
| 5-way pooled | 5/5 | 468.87ms / 621.14ms | 0ms / 0ms | 0 deadlocks / 0 source mutations |
| 10-way pooled | 10/10 | 1021.21ms / 1476.83ms | 0ms / 0ms | 0 deadlocks / 0 source mutations |
| 20-way pooled | 20/20 | 3037.04ms / 5283.85ms | 0ms / 0ms | 0 deadlocks / 0 source mutations |
| 25-way pooled | 25/25 | 3764.22ms / 6401.23ms | 0ms / 0ms | 0 deadlocks / 0 source mutations |
The attached-volume run then moved the same provider from a loopback Btrfs file to a real DigitalOcean block volume. That pass caught two host-realism bugs before publication: long benchmark branch names exceeded PostgreSQL's Unix socket path limit, and /tmp socket directories needed to be owned by the branch Postgres user. After both fixes, the attached volume completed 100 serial branches and 20/25 concurrent warmed branches with 0 source mutations. On this 4 vCPU / 8GB droplet, 20-way is the honest sub-six-second envelope; 25-way is a correctness result that needs a larger storage host or smarter request scheduling.
| Real volume run | Success | Create p50 / p95 | Snapshot or hot-path clone | Integrity |
|---|---|---|---|---|
| Attached volume serial | 100/100 | 799.86ms / 997ms | 53.19ms / 121.44ms | 0 source mutations |
| Attached volume pooled 20-way | 20/20 | 3153.63ms / 4768.74ms | 0ms / 0ms | 0 deadlocks / 0 source mutations |
| Attached volume pooled 25-way | 25/25 | 4545.72ms / 7130.54ms | 0ms / 0ms | 0 deadlocks / 0 source mutations |
The pool caught production-class invalidation bugs before publication.
This is why the prepared-pool note is useful: the benchmark did not only produce faster numbers. It found production-class failure modes in branch naming, schema validation, benchmark refill behavior, and numeric fixture compatibility.
| Bug | What happened | Fix |
|---|---|---|
| Pool-name collision | The first warm-pool naming pass let the storage-name normalizer truncate random suffixes, making prepared branch schema names collide under burst load. | The random suffix now lands before the truncation boundary, so warmed branch schemas stay unique. |
| Hot-path catalog validation | A defensive per-lease information_schema check pushed simple prepared branches back into hundreds of milliseconds under 100-way load. | The hot path now trusts in-process warm entries and invalidates against source schema fingerprints instead. |
| Background refill debt | The service refill loop correctly replenished warm branches, but benchmark runs were left with prepared schemas after completion. | The benchmark can disable refill while the production path keeps delayed refill for long-running services. |
| Numeric benchmark IDs | The 5B coverage verifier added a numeric expression index, and the heavy-write benchmark generated non-numeric branch IDs. | The benchmark now generates IDs that stay inside the numeric coverage contract. |
| Overlay unique lookup | A 5B-source bulk-write probe became slow because branch-local expression uniqueness was scanning the growing overlay row by row. | The branch overlay now gets lookup indexes that mirror source unique/expression indexes, removing the growing-overlay scan before the set-based bulk path. |
| Trigger-per-row bulk path | After the lookup fix, large INSERT ... SELECT workloads were still slow because the COW view trigger fired once per inserted row. | Lazy COW branches now use a set-based bulk insert path: resolve source metadata, validate uniqueness in sets, then insert directly into the overlay. |
| Postgres socket path length | The first attached-volume benchmark failed because long generated branch names pushed the Unix socket path over the PostgreSQL limit. | Branch Postgres sockets now live under a short /tmp/imladri-pg-<port> path instead of inside the branch data directory. |
| Socket ownership on real hosts | Moving sockets to /tmp exposed a real host permission issue: the provider created the socket directory as root while branch Postgres ran as the imladri user. | The provider now gives the branch Postgres user ownership of the socket directory, or falls back to a writable mode when no run user is configured. |
The honest next target is partner-workload write throughput.
| Prewarm cost is real | The pool shifts COW materialization out of the hot path. Production still needs sizing and refill policy per customer workload. |
|---|---|
| Bulk writes are now bounded | The follow-up set-based path completed 1M branch-local rows in 12.88s and 10M rows in 138.4s with 0 source mutations. Partner workloads that need higher sustained write throughput should get COPY-style ingestion and workload-specific overlay indexes. |
| Sub-six-second full clone needs a storage backend | Prepared COW branches prove governed table-level branching. The physical branch pool reached 20 concurrent warmed physical branches at 4768.74ms p95 on a real attached Btrfs volume; 25-way stayed correct but crossed the speed envelope on the current 4 vCPU / 8GB droplet. |
| Correctness stayed intact | The prepared-pool run still passed FK behavior, branch-local integrity, 0 deadlocks, 0 source mutations, and 0 scratch-schema leaks. |
The benchmark artifact is public.
The JSON artifact records the prepared pool hits, phase timings, FK behavior booleans, 100-way concurrency result, deadlock count, source mutation count, and the 1M/10M-row bulk-write follow-up. The physical branch artifact records the 100-sample self-hosted Btrfs droplet repeat and the checkpoint/snapshot/Postgres-start phase split. The dedicated droplet artifacts record the same provider first on an isolated loopback Btrfs mount, then on a real attached DigitalOcean Btrfs block volume.
