Earlier protected-vs-raw runs compared different provider allocations. That answered whether Glasshouse could complete the lifecycle, but it mixed protection overhead with normal GPU marketplace variance. This test keeps the provider allocation fixed and changes only the execution path.
The result is narrower and stronger: a RunPod Secure H100 30-minute raw-first sequence measured 18.05%, and a RunPod Secure RTX PRO 6000 Blackwell 30-minute raw-first sequence measured 14.63%, while both completed verified attestation and zeroized cleanup.
Same-instance rows now cover three GPU/provider surfaces.
Negative rows are not a claim that protection makes training faster. They mean order and warm-state effects exceeded the measured protection cost in that short or noisy row. The 30-minute raw-first row is the most conservative current number because raw establishes the baseline before the protected lifecycle runs. The Vast.ai sustained rows are still important: they show the same lifecycle survives a second provider at 15 and 30 minutes, even when the measured delta is dominated by warm-state effects.
| Provider / GPU | Duration | Order | Protected eps/s | Raw eps/s | Measured delta | Status |
|---|---|---|---|---|---|---|
| RunPod H100 | 60s | protected first | 839.133 | 937.134 | 10.46% | valid noise check |
| RunPod H100 | 60s | raw first | 916.788 | 707.015 | -29.67% | valid noise check |
| RunPod H100 | 15m | protected first | 1,002.066 | 972.938 | -2.99% | valid sustained check |
| RunPod H100 | 30m | raw first | 580.205 | 707.995 | 18.05% | headline sustained check |
| RunPod PRO 6000 | 15m | raw first | 2,639.341 | 2,711.934 | 2.68% | Blackwell sustained check |
| RunPod PRO 6000 | 30m | raw first | 2,981.734 | 3,492.846 | 14.63% | Blackwell sustained check |
| Vast.ai A4000 | 60s | raw first | 980.756 | 976.741 | -0.41% | second-provider check |
| Vast.ai A4000 | 60s | protected first | 998.173 | 989.483 | -0.88% | second-provider check |
| Vast.ai A4000 | 15m | raw first | 928.339 | 864.427 | -7.39% | second-provider sustained check |
| Vast.ai A4000 | 30m | raw first | 980.781 | 972.199 | -0.88% | second-provider sustained check |
One attempted row did not meet the evidence bar.
The protected-first 30-minute attempt is intentionally not counted. The protected segment completed and zeroized, but the post-enclave raw callback did not report completion after the provider pod exited. The harness now enforces non-RunPod provider boot timeouts too, so provider stalls fail earlier and cleanly instead of burning the outer timeout.
| Duration | Order | Observed behavior | Decision |
|---|---|---|---|
30m | protected first | Protected completed and zeroized; the post-enclave raw callback never posted completion after the pod exited. | Excluded from the public overhead number. |
This removes provider allocation variance, not every source of variance.
Same-instance comparison is the right next step because it avoids comparing one rented GPU against another. It still does not remove all order effects. That is why the article keeps both directionality and caveats visible.
| Allocation shape | raw and protected segments ran sequentially on the same provider allocation |
| Provider variance removed | yes, within each same-provider row no separate raw/protected GPU allocation was compared |
| Remaining variance | order effects, runtime warm state, provider node behavior during the same allocation |
| Protected path | Glasshouse package, attestation, gated key release, evidence, zeroization |
| Raw path | same MLP training loop and initial weights, no Glasshouse lifecycle |
| Validation rule | raw completion, attestation verified, zeroized runtime, cleanup observed |
| Production-model scope | 7B smoke and short Qwen 0.5B same-instance overhead now shown; sustained 7B fine-tune is not claimed here |
The artifact keeps the non-secret measurement fields.
The public JSON excludes API keys, tunnel URLs, encrypted payloads, and provider credentials. It retains the measurement method, throughput values, attestation status, and cleanup result.
{
"provider": "runpod",
"gpuModel": "NVIDIA RTX PRO 6000 Blackwell Server Edition",
"sameInstance": true,
"durationSec": 1800,
"order": "raw-first",
"protectedEpochsPerSec": 2981.734,
"rawEpochsPerSec": 3492.846,
"overheadPct": 14.63,
"attestation": "verified",
"runtimeState": "zeroized"
}Qwen LoRA workloads now run through the same protected lifecycle.
The synthetic MLP remains the overhead benchmark because it is repeatable and cheap. The first real-model step is separate: Glasshouse ran a protected Qwen/Qwen2.5-0.5B-Instruct32-step LoRA workload, then same-allocation raw-vs-protected short Qwen 0.5B rows on RunPod A4000 and Vast.ai A4000. It also ran a protectedQwen/Qwen2.5-7B-Instruct BF16 LoRA smoke step on RunPod. Both emitted training progress, exported adapter digests, verified attestation, and zeroized the runtime. This is functional production-model evidence, not yet a sustained 7B fine-tune or production throughput claim.
| Small model | Qwen/Qwen2.5-0.5B-Instruct, 32 LoRA steps, final loss 4.1228 |
| Small model overhead | Qwen 0.5B, RunPod A4000, same allocation, raw first, 8 LoRA steps, 9.86% measured train-step overhead |
| Second-provider Qwen | Qwen 0.5B, Vast.ai A4000, 16 LoRA steps, both raw-first and protected-first order checks passed |
| 7B smoke | Qwen/Qwen2.5-7B-Instruct, BF16, 1 LoRA step, final loss 8.3173 |
| Provider / GPU | RunPod RTX 4090 for 7B smoke, RunPod A4000 and Vast.ai A4000 for short same-instance Qwen checks |
| Adapters | 270,336 trainable params on 0.5B; 1,261,568 trainable params on 7B |
| Evidence | attestation verified, progress events emitted, adapter SHA-256 exported, zeroized cleanup |
The local attestation server failed closed on nine tamper scenarios.
The live GPU proofs show provider execution, training, and zeroization. The local tamper suite isolates the attestation gate itself: bad composite hash, wrong manifest, stale timestamp, downgraded anti-debug profile, replayed nonce, and JWT claim mismatch were all rejected before key release. The only key release in the artifact is the valid measured control attestation.
| Composite hash mismatch | rejected |
| Manifest hash mismatch | rejected |
| Container ID mismatch | rejected |
| Stale timestamp | rejected |
| Anti-debug disabled | rejected |
| Runtime state spoofed | rejected |
| Anti-debug profile downgrade | rejected |
| Nonce replay | rejected |
| JWT claim mismatch | rejected |
The next jump is a real model workload.
This page is deliberately scoped to matched MLP training. The next production-relevance step is turning the 7B smoke path into a sustained fine-tune and then measuring protected-vs-raw overhead for the real-model workload.
| H100 repeat | Completed in the May 16 follow-up: 2.58% overhead on a 30-minute protected-first H100 same-instance row. |
| Real model duration | Extend Qwen LoRA from short smoke rows into sustained train-step runs before publishing overhead. |
| 7B follow-up | Turn the 7B smoke into a sustained fine-tune, then measure protected-vs-raw overhead. |
| Replicate headline rows | Repeat H100 and PRO 6000 30-minute rows again to get a distribution, not isolated samples. |
