An agent kernel is the runtime layer beneath agent frameworks. It is the component that decides how an AI agent is created, isolated, scheduled, paused, resumed, and persisted. Kernels do not write prompts and do not call tools. They run the process the prompt and the tool-calling loop live inside.

The term is deliberately borrowed from operating systems. In a general-purpose OS, the kernel provides process isolation, scheduling, memory management, and IPC — the primitives everything else assumes. An agent kernel plays the same role for AI agents. This post unpacks what that means in practice and where the kernel line sits relative to frameworks like LangGraph, CrewAI, and Mastra.

Why we need a new layer

Early agent projects (AutoGPT, BabyAGI) shipped the loop, the prompt, the tools, and the runtime as one artefact. That works until you try to run more than one agent at a time, expose one to untrusted input, or keep one alive across a machine restart. At that point you discover that the "runtime" was an afterthought: a while loop inside a Python script.

Later agent frameworks (LangChain, LangGraph, CrewAI, Mastra) extracted composition — the graphs, chains, and crews — into a reusable layer. They got better at modelling what the agent does. But they mostly inherited the earlier runtime assumption: one process, one thread, one agent, for the life of an HTTP request.

An agent kernel extracts the runtime — the layer that was implicit before — into its own component. Once you separate it out, six concerns come into focus.

What a kernel owns

Process lifecycle

An agent is not a function call. It is a process: it allocates memory, holds state, and exists in time. A kernel defines the lifecycle transitions explicitly — init, run, pause, resume, stop — and makes sure that a crash, a SIGTERM, or a deployment rollover never corrupts mid-flight state.

Scheduling

Running one agent is easy; running a thousand is a systems problem. The kernel owns the scheduler: which agent runs, how many can run at once, what the cost model is, and how signals propagate when work has to be cancelled.

Memory

"Memory" for an agent is not one thing. There is the working set (the live tensors and Python objects during a turn), the conversation state (what the model has seen), and the long-term store (facts the agent has chosen to persist). A kernel distinguishes all three and gives each one a clean API. Frameworks tend to conflate them.

IPC

Multi-agent systems need a transport. The kernel defines how agents talk to each other and to the host environment — a message format, a flow-control discipline, a schema for signals. Without a kernel layer, IPC is ad-hoc and every team invents it differently.

Sandboxing

If your agent runs code or invokes tools from untrusted input, you need isolation the language runtime cannot provide. The kernel enforces process-level sandboxing — a real OS boundary, not a try/catch. This is where the kernel metaphor is most literal: the guarantee is the same one Unix gives you between two user-space processes.

Checkpoint and resume

Long agent runs should survive. A kernel knows how to take a snapshot of the live state, write it durably, and bring it back. This is what lets an agent run for an hour across a deploy, or hand off between machines, without losing its place.

Kernel vs framework

The line is clearest when you ask: if I threw away my framework, could I still describe the runtime? If yes, the two layers are cleanly separated.

A concrete case. Suppose your agent is expressed as a LangGraph state machine calling an OpenAI model through a set of tools. The state graph, the prompts, and the tool bindings are the framework's job. The while loop that evaluates that graph, enforces timeouts, owns the conversation store, and persists checkpoints is the kernel's job. The two are orthogonal. LangGraph plus a Namzu-style kernel compose cleanly because they speak about different things.

Kernel and framework are not competitors; they are neighbours. Mastra and CrewAI fit the same way — see the compare hub for how each one aligns with the kernel layer.

What makes a kernel useful

If you are deciding whether an agent kernel is the right shape for a project, the question is not "does my agent work today?" It is: what breaks when the project grows?

Running more than one agent concurrently — a kernel's scheduler and isolation guarantees are load-bearing.
Running agents on untrusted input — sandboxing is non-negotiable; language-level defences are not enough.
Keeping agents alive across deploys — checkpoint/resume makes this routine instead of heroic.
Running the same agent on multiple machines — IPC is the primitive that makes it tractable.
Swapping the LLM vendor — a kernel with a narrow LLMProvider interface protects the rest of the code from vendor-specific assumptions.

If none of those apply, a framework is enough. If any of them do, the kernel layer is where the hard problems live.

Where to go from here

Namzu is one agent kernel. There will be others. The category is new enough that the word kernel is still being earned — six months ago most of this shipped as "agent runtime" or "agent platform". The naming will settle; the shape of the layer already has.

Read the Namzu manifesto for the argument that the kernel has to be open source.
Read Namzu vs LangChain for how kernel and framework compose.
Browse the changelog for what has shipped.