The earlier overhead article moved from parallel provider allocations to same-instance measurements, but the headline row still needed a repeat. This run gives the first clean follow-up on a production-class H100: same provider allocation, same pod, same workload, sequential protected and raw legs.
Protected throughput was 1,032.108 epochs/s. Raw throughput was 1,059.493 epochs/s. That places Glasshouse overhead at 2.58% for this 30-minute H100 row.
The raw baseline ran after zeroized protected execution on the same pod.
Both legs ran for the same target duration. The protected leg emitted the expected lifecycle events and ended in zeroized state before the raw baseline finished.
| Leg | Elapsed | Epochs | Epochs/s | Attestation | Exit |
|---|---|---|---|---|---|
| Protected | 1,800.001s | 1,857,796 | 1,032.108 | verified | zeroized |
| Raw baseline | 1,800.001s | 1,907,088 | 1,059.493 | not applicable | same pod |
This is the cleanest overhead row so far.
Same-instance testing removes the biggest problem in the earlier provider-overhead measurements: allocation variance. The protected and raw paths still run in sequence, so order effects can exist, but the comparison is no longer between two different rented GPUs.
| Provider | RunPod Secure Cloud |
| GPU | NVIDIA H100 80GB HBM3 |
| Comparison shape | protected and raw ran sequentially on the same provider allocation |
| Order | protected first, raw second |
| Measured unit | epochs per second on the matched CUDA/PyTorch MLP workload |
| Warm-up | 5 seconds before each measured segment |
| Protected lifecycle | attestation, gated key release, encrypted package load, execution evidence, zeroized cleanup |
| Cleanup check | RunPod reported zero active pods after completion |
The result is strong, but still scoped.
This is the right number to use for the current Glasshouse overhead story because it is same-allocation and production GPU hardware. It should still be described with the boundaries below.
| Still one sample | This is a cleaner result, not yet a distribution. The next step is repeating the same row. |
| Still synthetic MLP | The benchmark is not yet a sustained Qwen or Llama fine-tune. That remains the next production-workload test. |
| Not hardware TEE | This remains software-enforced protected execution on third-party GPU infrastructure. |
The artifact keeps measurement fields and excludes secrets.
The public JSON keeps provider, GPU, duration, lifecycle result, throughput, and cleanup status. It excludes API keys, local tunnels, encrypted payload paths, and provider credentials.
{
"provider": "runpod",
"gpuModel": "NVIDIA H100 80GB HBM3",
"mode": "same-instance",
"sameInstanceOrder": "protected-first",
"trainDurationSec": 1800,
"protectedEpochsPerSec": 1032.108,
"rawEpochsPerSec": 1059.493,
"overheadPct": 2.58,
"attestation": "verified",
"runtimeState": "zeroized",
"cleanup": "0 active pods"
}The next valuable step is production-model duration.
The H100 overhead number is now credible enough to publish. The next research should move from the matched MLP workload into a longer Qwen or Llama LoRA run, then repeat enough times to produce a small distribution.
| Repeat the row | Run another 30-minute same-instance H100 or PRO 6000 sequence to turn the headline into a small distribution. |
| Move to Qwen duration | Extend Qwen LoRA from smoke/short rows to a longer train-step benchmark. |
| Scale model size | Use the stable Qwen path as the base for a longer 7B-class workload. |
