Sandbox performance & footprint¶
IronClaw runs every agent inside its own gVisor (runsc) sandbox —
network=none, all capabilities dropped, no_new_privs, non-root user
namespace, read-only rootfs, cgroup memory/CPU limits. That isolation is the
product. This page quantifies what it costs at runtime so you can size hosts and
judge the trade honestly.
What this page is — and isn't
The overhead of gVisor is workload-dependent, so the only numbers worth trusting are the ones you measure on your hardware with your runtime version. This page ships a reproducible harness that does exactly that, plus a conservative expectation profile drawn from gVisor's own published performance guidance. We do not quote hero numbers from a machine you can't inspect.
Reproduce it yourself¶
A committed harness, scripts/bench/sandbox-bench.sh,
launches a minimal OCI bundle whose config.json mirrors the real IronClaw trust
boundary (the same fields internal/host/isolation
emits) and times it under runsc versus a host baseline (runc). Because the
same bundle runs under both runtimes, everything that is not the isolation
layer cancels out — the delta is the gVisor wall, nothing else.
# On a Linux host with gVisor installed (and, ideally, runc for the baseline):
scripts/bench/sandbox-bench.sh --iterations 50
# Pin the rootfs by digest for byte-for-byte repeatability:
BENCH_IMAGE=busybox@sha256:<digest> scripts/bench/sandbox-bench.sh
It prints a Markdown results table and writes results.json plus a
methodology.txt capturing kernel, CPU, RAM, and exact runsc/runc versions.
gVisor needs a real host
runsc requires a gofer-capable host kernel. It will not start inside a
nested/locked-down CI runner (gofer creation fails with
fork/exec /proc/self/exe). Run the harness on bare metal or a VM where
runsc do true succeeds.
What it measures¶
| Metric | Workload | Why it matters |
|---|---|---|
| Cold start | launch from a freshly staged rootfs | time to spin up a new agent |
| Warm start | launch with the rootfs already cached | time to respawn an agent |
| Per-sandbox memory | resident RSS of the whole sandbox process tree (sentry + gofer) while a trivial workload idles | how many agents fit on a host |
| CPU-bound | a fixed integer loop, no syscalls in the hot path | compute overhead (gVisor's best case) |
| Syscall-bound | a stat-heavy filesystem walk | I/O overhead (gVisor's worst case) |
What to expect¶
gVisor implements the Linux syscall surface in a user-space kernel (the Sentry) and proxies filesystem I/O through a Gofer. The cost of that indirection is highly uneven across workload classes. The profile below is conservative and consistent with gVisor's published performance guide; your harness run should land in the same ballpark.
| Dimension | Expected overhead vs a runc baseline |
|---|---|
| CPU-bound compute | Near-native — within a few percent. Work that stays in userspace barely touches the Sentry. |
| Memory per sandbox | A fixed per-sandbox cost (Sentry + Gofer), typically on the order of tens of MiB of RSS beyond the workload itself. |
| Process / sandbox start | A modest additive cost over runc — on the order of a couple hundred milliseconds for the Sentry and Gofer to come up. One-time per agent launch, not per request. |
| Syscall- / I/O-heavy work | The largest gap — stat/open/read-heavy paths can run noticeably slower (often in the ~1.5–2.5× range) because every syscall is mediated. This is the cost you are explicitly buying isolation with. |
| Network throughput | Not applicable. gVisor's network path is its weakest dimension — but IronClaw sandboxes run network=none with no NIC at all, so this overhead simply does not exist here. Egress, when granted, is a host-mediated unix socket, not in-sandbox networking. |
Reading the trade-off¶
- The overhead is bounded and predictable, not a tax on every operation. CPU-bound agent reasoning is near-native; the cost concentrates on syscall- and I/O-heavy bursts.
- The footprint is what makes density planning simple. Per-sandbox memory is a roughly fixed additive cost, so host capacity scales linearly with agent count.
- You pay once, at the boundary you actually care about. gVisor's worst dimension (networking) is one IronClaw has already removed by design. What remains is the syscall-mediation cost — which is the isolation. See the threat model for what that wall buys you and the security posture for the invariants it upholds.
Methodology notes¶
- Like-for-like. The same OCI bundle is run under
runscandrunc; the delta isolates the runtime, not image pulls, provisioning, or orchestration. - Conservative by construction. Medians over N iterations (default 50), a warm-up priming run discarded, and a best-effort page-cache drop before each cold-start sample (when run as root).
- Reproducible. Rootfs is exported from a pinnable OCI image (pin by digest);
every environment fact that affects the result is captured in
methodology.txtalongside the numbers. - gVisor-only. IronClaw positions on gVisor; these benchmarks make no claims about other runtimes (e.g. Kata) and the harness does not measure them.