Skip to content

Sandbox performance & footprint

IronClaw runs every agent inside its own gVisor (runsc) sandboxnetwork=none, all capabilities dropped, no_new_privs, non-root user namespace, read-only rootfs, cgroup memory/CPU limits. That isolation is the product. This page quantifies what it costs at runtime so you can size hosts and judge the trade honestly.

What this page is — and isn't

The overhead of gVisor is workload-dependent, so the only numbers worth trusting are the ones you measure on your hardware with your runtime version. This page ships a reproducible harness that does exactly that, plus a conservative expectation profile drawn from gVisor's own published performance guidance. We do not quote hero numbers from a machine you can't inspect.

Reproduce it yourself

A committed harness, scripts/bench/sandbox-bench.sh, launches a minimal OCI bundle whose config.json mirrors the real IronClaw trust boundary (the same fields internal/host/isolation emits) and times it under runsc versus a host baseline (runc). Because the same bundle runs under both runtimes, everything that is not the isolation layer cancels out — the delta is the gVisor wall, nothing else.

# On a Linux host with gVisor installed (and, ideally, runc for the baseline):
scripts/bench/sandbox-bench.sh --iterations 50

# Pin the rootfs by digest for byte-for-byte repeatability:
BENCH_IMAGE=busybox@sha256:<digest> scripts/bench/sandbox-bench.sh

It prints a Markdown results table and writes results.json plus a methodology.txt capturing kernel, CPU, RAM, and exact runsc/runc versions.

gVisor needs a real host

runsc requires a gofer-capable host kernel. It will not start inside a nested/locked-down CI runner (gofer creation fails with fork/exec /proc/self/exe). Run the harness on bare metal or a VM where runsc do true succeeds.

What it measures

Metric Workload Why it matters
Cold start launch from a freshly staged rootfs time to spin up a new agent
Warm start launch with the rootfs already cached time to respawn an agent
Per-sandbox memory resident RSS of the whole sandbox process tree (sentry + gofer) while a trivial workload idles how many agents fit on a host
CPU-bound a fixed integer loop, no syscalls in the hot path compute overhead (gVisor's best case)
Syscall-bound a stat-heavy filesystem walk I/O overhead (gVisor's worst case)

What to expect

gVisor implements the Linux syscall surface in a user-space kernel (the Sentry) and proxies filesystem I/O through a Gofer. The cost of that indirection is highly uneven across workload classes. The profile below is conservative and consistent with gVisor's published performance guide; your harness run should land in the same ballpark.

Dimension Expected overhead vs a runc baseline
CPU-bound compute Near-native — within a few percent. Work that stays in userspace barely touches the Sentry.
Memory per sandbox A fixed per-sandbox cost (Sentry + Gofer), typically on the order of tens of MiB of RSS beyond the workload itself.
Process / sandbox start A modest additive cost over runc — on the order of a couple hundred milliseconds for the Sentry and Gofer to come up. One-time per agent launch, not per request.
Syscall- / I/O-heavy work The largest gap — stat/open/read-heavy paths can run noticeably slower (often in the ~1.5–2.5× range) because every syscall is mediated. This is the cost you are explicitly buying isolation with.
Network throughput Not applicable. gVisor's network path is its weakest dimension — but IronClaw sandboxes run network=none with no NIC at all, so this overhead simply does not exist here. Egress, when granted, is a host-mediated unix socket, not in-sandbox networking.

Reading the trade-off

  • The overhead is bounded and predictable, not a tax on every operation. CPU-bound agent reasoning is near-native; the cost concentrates on syscall- and I/O-heavy bursts.
  • The footprint is what makes density planning simple. Per-sandbox memory is a roughly fixed additive cost, so host capacity scales linearly with agent count.
  • You pay once, at the boundary you actually care about. gVisor's worst dimension (networking) is one IronClaw has already removed by design. What remains is the syscall-mediation cost — which is the isolation. See the threat model for what that wall buys you and the security posture for the invariants it upholds.

Methodology notes

  • Like-for-like. The same OCI bundle is run under runsc and runc; the delta isolates the runtime, not image pulls, provisioning, or orchestration.
  • Conservative by construction. Medians over N iterations (default 50), a warm-up priming run discarded, and a best-effort page-cache drop before each cold-start sample (when run as root).
  • Reproducible. Rootfs is exported from a pinnable OCI image (pin by digest); every environment fact that affects the result is captured in methodology.txt alongside the numbers.
  • gVisor-only. IronClaw positions on gVisor; these benchmarks make no claims about other runtimes (e.g. Kata) and the harness does not measure them.