What changed

The earlier overhead article moved from parallel provider allocations to same-instance measurements, but the headline row still needed a repeat. This run gives the first clean follow-up on a production-class H100: same provider allocation, same pod, same workload, sequential protected and raw legs.

Protected throughput was 1,032.108 epochs/s. Raw throughput was 1,059.493 epochs/s. That places Glasshouse overhead at 2.58% for this 30-minute H100 row.

Measured result

The raw baseline ran after zeroized protected execution on the same pod.

Both legs ran for the same target duration. The protected leg emitted the expected lifecycle events and ended in zeroized state before the raw baseline finished.

Leg	Elapsed	Epochs	Epochs/s	Attestation	Exit
Protected	`1,800.001s`	`1,857,796`	`1,032.108`	`verified`	`zeroized`
Raw baseline	`1,800.001s`	`1,907,088`	`1,059.493`	`not applicable`	`same pod`

Methodology

This is the cleanest overhead row so far.

Same-instance testing removes the biggest problem in the earlier provider-overhead measurements: allocation variance. The protected and raw paths still run in sequence, so order effects can exist, but the comparison is no longer between two different rented GPUs.

Provider	`RunPod Secure Cloud`
GPU	`NVIDIA H100 80GB HBM3`
Comparison shape	`protected and raw ran sequentially on the same provider allocation`
Order	`protected first, raw second`
Measured unit	`epochs per second on the matched CUDA/PyTorch MLP workload`
Warm-up	`5 seconds before each measured segment`
Protected lifecycle	`attestation, gated key release, encrypted package load, execution evidence, zeroized cleanup`
Cleanup check	`RunPod reported zero active pods after completion`

Scope

The result is strong, but still scoped.

This is the right number to use for the current Glasshouse overhead story because it is same-allocation and production GPU hardware. It should still be described with the boundaries below.

Still one sample	This is a cleaner result, not yet a distribution. The next step is repeating the same row.
Still synthetic MLP	The benchmark is not yet a sustained Qwen or Llama fine-tune. That remains the next production-workload test.
Not hardware TEE	This remains software-enforced protected execution on third-party GPU infrastructure.

Public artifact

The artifact keeps measurement fields and excludes secrets.

The public JSON keeps provider, GPU, duration, lifecycle result, throughput, and cleanup status. It excludes API keys, local tunnels, encrypted payload paths, and provider credentials.

{
  "provider": "runpod",
  "gpuModel": "NVIDIA H100 80GB HBM3",
  "mode": "same-instance",
  "sameInstanceOrder": "protected-first",
  "trainDurationSec": 1800,
  "protectedEpochsPerSec": 1032.108,
  "rawEpochsPerSec": 1059.493,
  "overheadPct": 2.58,
  "attestation": "verified",
  "runtimeState": "zeroized",
  "cleanup": "0 active pods"
}

Open H100 repeat JSON

Next tests

The next valuable step is production-model duration.

The H100 overhead number is now credible enough to publish. The next research should move from the matched MLP workload into a longer Qwen or Llama LoRA run, then repeat enough times to produce a small distribution.

Repeat the row	Run another 30-minute same-instance H100 or PRO 6000 sequence to turn the headline into a small distribution.
Move to Qwen duration	Extend Qwen LoRA from smoke/short rows to a longer train-step benchmark.
Scale model size	Use the stable Qwen path as the base for a longer 7B-class workload.

Glasshouse measured 2.58% overhead on the same H100 allocation.

Five chapters. One continuous proof story.

The raw baseline ran after zeroized protected execution on the same pod.

This is the cleanest overhead row so far.

The result is strong, but still scoped.

The artifact keeps measurement fields and excludes secrets.

The next valuable step is production-model duration.