Why this matters

Synthetic CUDA benchmarks are useful for repeatability, but buyers eventually ask whether the protected path survives a real model workload. This run answers that next question first for a small Qwen LoRA job, then for Qwen 7B LoRA, without changing GPU allocations between raw and protected execution.

The clean rows show that Glasshouse can attest, release, execute, record, and zeroize around a real LoRA training loop while measured overhead stays within same-instance runtime noise.

Result table

The clean 0.5B 15-minute, 30-minute, and 60-minute rows stayed near zero measured overhead.

Each row used a fixed wall-clock target. Raw ran first, then the protected Glasshouse path ran on the same RunPod allocation.

Row	Raw elapsed	Protected elapsed	Raw steps	Protected steps	Measured overhead	Scope
`15m`	`900.351s`	`900.449s`	`9,143`	`9,065`	`0.86%`	clean model artifact
`30m`	`1,800.423s`	`1,800.405s`	`18,121`	`18,202`	`-0.45%`	clean model artifact
`60m`	`3,600.629s`	`3,600.377s`	`39,661`	`39,336`	`0.81%`	clean model artifact

7B follow-up

The 7B smoke, duration, and step-count rows passed on RunPod A4000 and H100 allocations.

The 7B row is the important step beyond the 0.5B duration sweep. Qwen/Qwen2.5-7B-Instruct loaded in BF16, trained LoRA adapters with finite losses, produced distinct raw/protected adapter digests, and finished with verified attestation and zeroized cleanup. The newest H100 repeat flips the order: protected ran first, zeroized, and raw then completed on the same allocation.

Row	Raw elapsed	Protected elapsed	Raw steps	Protected steps	Measured overhead	Scope
`7B smoke`	`39.689s`	`39.924s`	`256`	`256`	`0.59%`	same-instance 7B fit and lifecycle check
`7B 15m`	`900.373s`	`900.387s`	`8,452`	`8,358`	`1.11%`	sustained 15-minute 7B LoRA row
`7B 30m`	`1,800.435s`	`1,800.379s`	`16,851`	`16,807`	`0.26%`	sustained 30-minute 7B LoRA row
`7B 30m H100`	`1,800.923s`	`1,801.216s`	`15,305`	`15,190`	`0.77%`	H100 repeat on a higher-end RunPod Secure allocation
`7B 30k-step H100 protected-first`	`2,161.061s`	`2,145.464s`	`30,000`	`30,000`	`-0.73%`	same H100 allocation, protected first, raw second

Methodology

Same allocation, real model, explicit target seconds.

This run intentionally fixes the biggest measurement problem from early GPU marketplace tests: provider allocation variance. Raw and protected segments used the same rented GPU allocation instead of comparing two different provider nodes.

Provider	`RunPod Secure Cloud`
GPU	`RTX 4090 for 0.5B 15m/30m; RTX 3090 for 0.5B 60m; RTX A4000 and H100 for 7B rows`
Comparison	`raw and protected ran sequentially on the same provider allocation`
Orders	`0.5B rows raw-first; sustained 7B rows protected-first after failed provider allocations were cleaned up`
Models	`Qwen/Qwen2.5-0.5B-Instruct and Qwen/Qwen2.5-7B-Instruct`
Workload	`LoRA training under a fixed wall-clock target`
60m stability fix	`bfloat16, 5e-5 learning rate, gradient clip 1.0, fail-on-nonfinite enabled`
7B settings	`bfloat16, batch size 1, max length 64, 1e-5 learning rate, gradient clip 1.0`
Protected path	`attestation, gated key release, protected execution evidence, zeroized cleanup`
TEE scope	`software-enforced protected execution on rented GPU infrastructure, not a hardware TEE claim`
Cleanup	`RunPod reported zero active pods after the latest H100 repeat`

Model artifacts

The clean rows emitted different raw and protected adapter digests.

Different adapter hashes are expected here: the raw and protected segments are separate training runs. The important part is that the public artifact records the digest and loss for each clean row, while the protected path also records the attested lifecycle.

Artifact	Final loss	Adapter SHA-256
15m protected	`3.720355e-05`	`520660edb6460102da5fea8625c9ff3808e55555365eb047c5438be9dff54bd3`
15m raw	`3.965292e-05`	`b4d36a0834745df14cb1795144dbe724e39f053d256edc040df3d79b2be1f003`
30m protected	`0.0001837593`	`0881acc7e59014485d36130e8923cb88baafea96e264a664c623956d5f7e28cf`
30m raw	`0.0001055617`	`567df32f71298a40c8e423668c7c74e7f581ec9a16a679cc96c163370107fec1`
60m protected	`4.541306e-08`	`279216e323ba3e7ab17f4e08018ff2f99ca032f86f9d2475d6cef426eee8c9b0`
60m raw	`3.405979e-08`	`64c46cc6cd8b0a99b3332bad3849efe544df0b0c92186eea137c6751e8776b0c`
7B smoke protected	`3.5975997`	`a7d0ec4093f47758b9158a0bebf69e68b765f660013c51ea6a73badece449d7e`
7B smoke raw	`3.6913116`	`eaa486847c69e2530aa989150dbca628631df450854f99ca6fbf300931019e47`
7B 15m protected	`1.299379e-06`	`3f8fea702cd6c3aeb860b0f158fd3c8d5609f8b27a78029a5382af77467cd1cb`
7B 15m raw	`1.573558e-06`	`12632d35fe001a8ce719c6b5ea417a61429a73b310bd1de4163c92a34d6f6753`
7B 30m protected	`5.960464e-08`	`6c3a1525aef8cd23c52f9fc274d4c8df1966aaec565505261810d00505d1a709`
7B 30m raw	`1.430511e-07`	`7bbd583188c858cc4608cb98df05e6f5b18431f750925abf689e15f1126cc18b`
7B 30m H100 protected	`7.152557e-08`	`47c0cc72e80599bedaa29097b992572dc104a280604a83ca05eb01e230dc56b1`
7B 30m H100 raw	`7.152557e-08`	`d8063f11a54340f81bb204abd1b78e2fa40bfcd47438b0ce3641a47e6e3797f9`

Caveat fixed

The stabilized 60-minute row is now clean model-workload evidence.

The original 60-minute row stayed public because the lifecycle proof passed, but the model artifact was not clean. The rerun fixed that caveat by lowering training aggressiveness and failing immediately on non-finite loss.

Original issue	The first 60-minute row completed lifecycle proof, but final loss was null and raw/protected adapter digests matched.
Fix	The rerun used bfloat16, a lower learning rate, gradient clipping, and fail-on-nonfinite checks.
New 60m artifact	The stabilized row emitted finite losses, distinct adapter SHA-256 digests, verified attestation, and zeroized cleanup.
Still not claimed	This is now 0.5B duration evidence plus 15-minute and 30-minute 7B LoRA rows on A4000 and H100, but not yet a 7B hour-long distribution.
7B follow-up	The May 20 H100 repeat used the same allocation for raw and protected segments, verified attestation, zeroized cleanup, finite losses, and distinct adapter digests.
Protected-first repeat	The May 21 protected-first H100 repeat ran 30,000 protected Qwen 7B LoRA steps, zeroized, then ran 30,000 raw steps on the same allocation. The measured delta was -0.73%, which is best read as inside same-instance runtime variance.

Public artifact

The JSON keeps measurement fields and excludes secrets.

The public JSON keeps provider, GPU, model, timing, step counts, digest, attestation, runtime state, stability notes, and cleanup status. It does not include API keys, local tunnel URLs, encrypted payload paths, or provider credentials.

{
  "provider": "runpod",
  "sameAllocation": true,
  "rows": [
    { "model": "Qwen 0.5B", "label": "60m", "overheadPct": 0.81, "runtimeState": "zeroized" },
    { "model": "Qwen 7B", "label": "smoke", "overheadPct": 0.59, "runtimeState": "zeroized" },
    { "model": "Qwen 7B", "label": "15m", "overheadPct": 1.11, "runtimeState": "zeroized" },
    { "model": "Qwen 7B", "label": "30m A4000", "overheadPct": 0.26, "runtimeState": "zeroized" },
    { "model": "Qwen 7B", "label": "30m H100", "overheadPct": 0.77, "runtimeState": "zeroized" },
    { "model": "Qwen 7B", "label": "30k-step H100 protected-first", "overheadPct": -0.73, "runtimeState": "zeroized" }
  ],
  "cleanup": { "activeRunPodPodsAfterRun": 0 }
}

Open Qwen wall-clock JSON Open H100 protected-first JSON

Glasshouse kept Qwen 7B H100 overhead inside same-instance variance.

Five chapters. One continuous proof story.