Frameworks describe what an agent does. Kernels describe how it runs. That is the whole distinction. Everything in this post is about why the line matters and where it sits when you are building real systems.

The collapse pattern

Every agent codebase that grows beyond a single demo eventually ships the same release notes: "added retry logic for tool calls", "fixed a race when two agents touched memory at the same time", "we now persist conversation state to disk", "the deploy used to lose mid-flight runs and now it doesn't". These are not feature changes. They are the symptoms of a missing runtime layer.

The collapse pattern looks like this. You start with a framework — LangChain, LangGraph, Mastra, CrewAI, your own — and the framework does what it says on the tin: it lets you describe an agent. You ship one. It works. You ship a few more. They mostly work. Then you try to run twenty concurrently, or expose one to public input, or keep one alive across a deploy, and the runtime concerns start landing in your application code. Six months later you have re-implemented half of an OS, badly, in your handlers.

The fix is not "buy a better framework". It is "extract the runtime as its own layer". That layer is the kernel.

What each layer is responsible for

A useful test: take any concern in an agent codebase and ask which question it answers.

Frameworks answer "what does the agent do?" Composition, prompts, tool definitions, state graphs, message-routing patterns, retrieval logic, evaluation hooks.
Kernels answer "how does the agent run?" Process lifecycle, scheduling, memory boundaries, IPC, sandboxing, checkpoint/resume, observability hooks for runtime events.

If the answer is "what?", it belongs above the kernel line. If the answer is "how?", it belongs below it. Almost every agent-design argument that goes in circles for two hours can be ended with this question.

A few examples:

"Where does the conversation history live?" That is a runtime concern. Kernel.
"Should the planner decide which tool to call?" That is a composition concern. Framework.
"What happens when a tool times out?" Runtime. Kernel surfaces the timeout; framework decides the policy.
"Should we use ReAct or a state graph?" Framework. The kernel does not care which loop shape the framework picks.
"Can two agents run in the same process?" Runtime. Kernel.
"How is a tool result re-injected into the prompt?" Framework.

Why mixing the layers hurts

When framework and runtime are tangled, three predictable things happen.

The runtime gets re-invented per agent. Each agent codebase grows its own ad-hoc retry loop, its own conversation store, its own way of cancelling in-flight work. The cost compounds: you cannot move agents between projects without rewriting the runtime around them.

Sandboxing becomes wishful thinking. Frameworks default to running tools in-process because that is the easy path. Once you need real isolation — agent-running-untrusted-code, multi-tenant deployments — there is nowhere to put it without ripping the framework apart. A kernel with process-level sandboxing makes this routine: tools execute in their own sandboxed process and the kernel mediates the boundary.

Vendor lock creeps in. Without a clean kernel/provider boundary, the agent code starts to know about a specific LLM vendor's quirks. When the time comes to swap providers, you discover that the framework's tool-calling primitive was modelled after one vendor's API and the rest of the code inherited the assumption.

The right composition

The healthy stack looks like a Unix system. Kernel at the bottom, providing a small set of primitives. Framework above, providing composition. Application above that, deciding what the agent is for.

+----------------------+
|   Application logic  |   what this agent does
+----------------------+
|   Agent framework    |   how the loop is shaped
+----------------------+
|   Agent kernel       |   how the process runs
+----------------------+
|   OS / runtime       |   how the machine works
+----------------------+

A LangGraph state machine running on a Namzu kernel fits this picture. So does a CrewAI crew. So does a hand-rolled loop someone wrote in an afternoon. The kernel does not care; that is the point.

Where Namzu sits

Namzu is the kernel layer. The public surface is intentionally small: lifecycle, scheduling, memory, IPC, sandboxing, checkpoint/resume, providers. We deliberately do not ship a framework on top of it. Two reasons.

First, the kernel earns its keep by being thin. The interface should be small enough that you can keep it in your head, and stable enough that frameworks above it can move without fear of what breaks below. A bundled framework would compete with that goal.

Second, framework choice is opinion. Crew vs graph vs ReAct is a real debate; the kernel should not pick a side. Different teams want different shapes. We want all of them to compose cleanly with the same runtime.

If you have not yet hit the collapse point — if your agents are short, single-tenant, and short-lived — a framework alone is enough. Once you hit it, the kernel is the layer that gets the runtime concerns out of your application code. The compare pages walk through how the kernel composes with LangChain, Mastra, Vercel AI SDK, CrewAI, and AutoGPT-style runtimes one by one.

The line is not subtle once you know it is there. It is just hard to see until something breaks.