Know what your agent loads before it writes.

hibench measures the default context footprint of coding agents: system prompts, tools, skills, MCP servers, and sub-agents loaded before the first user request is answered.

View the rankings Explore agents

Top 5 + flop 5 token ranking

heaviest and lightest latest versions · total request tokens

1 Claude Code: 21,618 2 OpenClaw: 18,540 3 Cursor CLI: 16,379 4 Grok CLI: 15,744 5 Copilot CLI: 12,395 7 Kilo Code: 9,762 8 Codex CLI: 8,730 9 OpenCode: 6,785 10 Cline: 3,890 11 Pi: 1,255

Goal & philosophy

①

Default cost is real cost

Every tool schema, skill description and system instruction is sent on the very first turn. That baseline is paid on each request, before any useful work happens.

②

Measure, don't guess

We capture the first real outbound request and count it with a single fixed tokenizer so numbers stay comparable across agents, versions and models.

③

Track the evolution

Footprints drift over releases. hibench keeps one canonical capture per version so you can watch context grow (or shrink) over time.

How it works

1
Isolate

Run each agent in Docker inside a fresh, empty Git repo.
2
Intercept

Point it at a local recorder with a dummy key — no upstream model call.
3
Capture

Send the prompt Hi and record the first outbound request body.
4
Count

Tokenize every field with o200k_base and break it down.