Back to enclave research
Glasshouse model workload / May 17, 2026 / updated May 21

Glasshouse kept Qwen 7B H100 overhead inside same-instance variance.

The previous Glasshouse overhead work used a matched CUDA/PyTorch benchmark. This follow-up moves the same-instance method onto a real transformer workload: Qwen 0.5B LoRA training on one RunPod RTX 4090 and RTX 3090 allocations, then Qwen 7B LoRA training on RunPod RTX A4000 and H100 allocations. The clean 0.5B 15-minute, 30-minute, and stabilized 60-minute rows measured0.86%, -0.45%, and 0.81%overhead. The new 7B smoke, 15-minute, and 30-minute rows measured0.59%, 1.11%, and 0.26% on A4000, then the H100 wall-clock repeat measured 0.77%. A protected-first 30,000-step H100 repeat measured -0.73%. Both should be read as inside same-instance runtime variance, with verified attestation, distinct adapter digests, raw callback completion, and zeroized cleanup.

Why this matters

Synthetic CUDA benchmarks are useful for repeatability, but buyers eventually ask whether the protected path survives a real model workload. This run answers that next question first for a small Qwen LoRA job, then for Qwen 7B LoRA, without changing GPU allocations between raw and protected execution.

The clean rows show that Glasshouse can attest, release, execute, record, and zeroize around a real LoRA training loop while measured overhead stays within same-instance runtime noise.

Result table

The clean 0.5B 15-minute, 30-minute, and 60-minute rows stayed near zero measured overhead.

Each row used a fixed wall-clock target. Raw ran first, then the protected Glasshouse path ran on the same RunPod allocation.

RowRaw elapsedProtected elapsedRaw stepsProtected stepsMeasured overheadScope
15m900.351s900.449s9,1439,0650.86%clean model artifact
30m1,800.423s1,800.405s18,12118,202-0.45%clean model artifact
60m3,600.629s3,600.377s39,66139,3360.81%clean model artifact
7B follow-up

The 7B smoke, duration, and step-count rows passed on RunPod A4000 and H100 allocations.

The 7B row is the important step beyond the 0.5B duration sweep. Qwen/Qwen2.5-7B-Instruct loaded in BF16, trained LoRA adapters with finite losses, produced distinct raw/protected adapter digests, and finished with verified attestation and zeroized cleanup. The newest H100 repeat flips the order: protected ran first, zeroized, and raw then completed on the same allocation.

RowRaw elapsedProtected elapsedRaw stepsProtected stepsMeasured overheadScope
7B smoke39.689s39.924s2562560.59%same-instance 7B fit and lifecycle check
7B 15m900.373s900.387s8,4528,3581.11%sustained 15-minute 7B LoRA row
7B 30m1,800.435s1,800.379s16,85116,8070.26%sustained 30-minute 7B LoRA row
7B 30m H1001,800.923s1,801.216s15,30515,1900.77%H100 repeat on a higher-end RunPod Secure allocation
7B 30k-step H100 protected-first2,161.061s2,145.464s30,00030,000-0.73%same H100 allocation, protected first, raw second
Methodology

Same allocation, real model, explicit target seconds.

This run intentionally fixes the biggest measurement problem from early GPU marketplace tests: provider allocation variance. Raw and protected segments used the same rented GPU allocation instead of comparing two different provider nodes.

ProviderRunPod Secure Cloud
GPURTX 4090 for 0.5B 15m/30m; RTX 3090 for 0.5B 60m; RTX A4000 and H100 for 7B rows
Comparisonraw and protected ran sequentially on the same provider allocation
Orders0.5B rows raw-first; sustained 7B rows protected-first after failed provider allocations were cleaned up
ModelsQwen/Qwen2.5-0.5B-Instruct and Qwen/Qwen2.5-7B-Instruct
WorkloadLoRA training under a fixed wall-clock target
60m stability fixbfloat16, 5e-5 learning rate, gradient clip 1.0, fail-on-nonfinite enabled
7B settingsbfloat16, batch size 1, max length 64, 1e-5 learning rate, gradient clip 1.0
Protected pathattestation, gated key release, protected execution evidence, zeroized cleanup
TEE scopesoftware-enforced protected execution on rented GPU infrastructure, not a hardware TEE claim
CleanupRunPod reported zero active pods after the latest H100 repeat
Model artifacts

The clean rows emitted different raw and protected adapter digests.

Different adapter hashes are expected here: the raw and protected segments are separate training runs. The important part is that the public artifact records the digest and loss for each clean row, while the protected path also records the attested lifecycle.

ArtifactFinal lossAdapter SHA-256
15m protected3.720355e-05520660edb6460102da5fea8625c9ff3808e55555365eb047c5438be9dff54bd3
15m raw3.965292e-05b4d36a0834745df14cb1795144dbe724e39f053d256edc040df3d79b2be1f003
30m protected0.00018375930881acc7e59014485d36130e8923cb88baafea96e264a664c623956d5f7e28cf
30m raw0.0001055617567df32f71298a40c8e423668c7c74e7f581ec9a16a679cc96c163370107fec1
60m protected4.541306e-08279216e323ba3e7ab17f4e08018ff2f99ca032f86f9d2475d6cef426eee8c9b0
60m raw3.405979e-0864c46cc6cd8b0a99b3332bad3849efe544df0b0c92186eea137c6751e8776b0c
7B smoke protected3.5975997a7d0ec4093f47758b9158a0bebf69e68b765f660013c51ea6a73badece449d7e
7B smoke raw3.6913116eaa486847c69e2530aa989150dbca628631df450854f99ca6fbf300931019e47
7B 15m protected1.299379e-063f8fea702cd6c3aeb860b0f158fd3c8d5609f8b27a78029a5382af77467cd1cb
7B 15m raw1.573558e-0612632d35fe001a8ce719c6b5ea417a61429a73b310bd1de4163c92a34d6f6753
7B 30m protected5.960464e-086c3a1525aef8cd23c52f9fc274d4c8df1966aaec565505261810d00505d1a709
7B 30m raw1.430511e-077bbd583188c858cc4608cb98df05e6f5b18431f750925abf689e15f1126cc18b
7B 30m H100 protected7.152557e-0847c0cc72e80599bedaa29097b992572dc104a280604a83ca05eb01e230dc56b1
7B 30m H100 raw7.152557e-08d8063f11a54340f81bb204abd1b78e2fa40bfcd47438b0ce3641a47e6e3797f9
Caveat fixed

The stabilized 60-minute row is now clean model-workload evidence.

The original 60-minute row stayed public because the lifecycle proof passed, but the model artifact was not clean. The rerun fixed that caveat by lowering training aggressiveness and failing immediately on non-finite loss.

Original issueThe first 60-minute row completed lifecycle proof, but final loss was null and raw/protected adapter digests matched.
FixThe rerun used bfloat16, a lower learning rate, gradient clipping, and fail-on-nonfinite checks.
New 60m artifactThe stabilized row emitted finite losses, distinct adapter SHA-256 digests, verified attestation, and zeroized cleanup.
Still not claimedThis is now 0.5B duration evidence plus 15-minute and 30-minute 7B LoRA rows on A4000 and H100, but not yet a 7B hour-long distribution.
7B follow-upThe May 20 H100 repeat used the same allocation for raw and protected segments, verified attestation, zeroized cleanup, finite losses, and distinct adapter digests.
Protected-first repeatThe May 21 protected-first H100 repeat ran 30,000 protected Qwen 7B LoRA steps, zeroized, then ran 30,000 raw steps on the same allocation. The measured delta was -0.73%, which is best read as inside same-instance runtime variance.
Public artifact

The JSON keeps measurement fields and excludes secrets.

The public JSON keeps provider, GPU, model, timing, step counts, digest, attestation, runtime state, stability notes, and cleanup status. It does not include API keys, local tunnel URLs, encrypted payload paths, or provider credentials.

sanitized excerpt
{
  "provider": "runpod",
  "sameAllocation": true,
  "rows": [
    { "model": "Qwen 0.5B", "label": "60m", "overheadPct": 0.81, "runtimeState": "zeroized" },
    { "model": "Qwen 7B", "label": "smoke", "overheadPct": 0.59, "runtimeState": "zeroized" },
    { "model": "Qwen 7B", "label": "15m", "overheadPct": 1.11, "runtimeState": "zeroized" },
    { "model": "Qwen 7B", "label": "30m A4000", "overheadPct": 0.26, "runtimeState": "zeroized" },
    { "model": "Qwen 7B", "label": "30m H100", "overheadPct": 0.77, "runtimeState": "zeroized" },
    { "model": "Qwen 7B", "label": "30k-step H100 protected-first", "overheadPct": -0.73, "runtimeState": "zeroized" }
  ],
  "cleanup": { "activeRunPodPodsAfterRun": 0 }
}