Runtime-agnostic agent guardrails
Model-selected tool calls, native tools, adversarial plans, halt state, and proof across OpenClaw, Hermes, MCP, and Generic HTTP before dangerous function bodies run.
Eight chapters. One continuous agent-control story.
The path moves from the first OpenClaw payment block, to adversarial replay, late-stage drift, native tool-body proof, second-runtime model turns through Hermes, Hermes parity, a four-runtime parity proof, and finally a clean-install CLI path for those adapters.
The first three reads in this boundary.
These notes give the clearest path from product behavior to measured evidence before the full archive.
A fresh install wrapped four agent runtimes with one CLI path.
The Imladri CLI was packed, installed in a fresh npm workspace, used to initialize OpenClaw, Hermes, MCP, and Generic HTTP adapters, then blocked 100/100 mixed-runtime prohibited attempts with zero body calls.
One policy controlled OpenClaw, Hermes, MCP, and a Generic HTTP agent.
A final joint demo runner put OpenClaw, Hermes, MCP, and a Generic HTTP agent behind one Imladri constitution, then passed 100/100 mixed-runtime blocks, 4/4 delegation checks, shared halt, fail-closed preflight, and schema checks with zero prohibited body calls.
Plan-time guardrails are not enough. The bad action appeared at turn eight.
OpenClaw 2026.5.12 local-agent smoke passed; five model surfaces selected late-drift plans, and Imladri blocked 35/35 scenarios plus 700/700 concurrent replays with zero body calls.
Full archive for this boundary.
Hermes reached native tool-body parity with OpenClaw.
Hermes loaded the Imladri plugin through its real PluginManager, blocked three dangerous native bodies, then passed 700 adversarial replays, 140 latency-budget checks, 50 prohibited actions, and three model-provider turns.
Hermes selected the payment workflow. Imladri blocked it across three model providers.
Hermes chat model turns through OpenAI, Gemini, and DeepSeek selected the protected finance workflow; Imladri blocked all three before the dangerous body, then passed 700 adversarial replays and a 50-action matrix.
OpenClaw model.run selected the bad action. Imladri blocked it across native tools and 50 action classes.
OpenClaw model.run produced prohibited plans across four configured models; native plugin tools and a 50-action live matrix were blocked before prohibited bodies or side effects.
OpenClaw adversarial replay blocked 700 of 700 production-style attacks.
Follow-up research after the original OpenClaw proof: native OpenClaw tool blocking, 7/7 adversarial patterns, 700/700 concurrent blocks, a 200ms latency-budget run, and 9/9 malformed constitutions failing closed.
Five OpenClaw model-plan reruns selected payment.transfer. Imladri blocked all before the body.
OpenAI, Gemini, and DeepSeek OpenClaw model-plan reruns selected a prohibited payment tool; a native OpenClaw plugin tool run then confirmed Imladri blocked before the real tool body.
