In this article
Why memory is the missing piece
The first wave of agent deployments ran into the same wall, expressed in different ways. Customer service agents that greeted returning customers as if they'd never spoken. Coding agents that made the same architectural mistake on the third project that they'd made on the first. Research agents that re-discovered the same facts in every session, paying the same compute cost to learn the same things repeatedly. The common thread is the absence of memory.
This isn't a fundamental limitation of language models — it's a consequence of how most agent systems are architected. A language model's weights encode knowledge from training, but within a session, the only state available is the context window: the current conversation and whatever is explicitly loaded into it. When the session ends, that state disappears unless something outside the model persists it.
The architecture decision is what to persist, how to organise it, and how to retrieve it when it's needed. These decisions turn an amnesiac agent that restarts from zero on every run into a persistent agent that accumulates knowledge, learns from failures, and improves over time. The gap between these two is not subtle. A persistent agent is qualitatively more capable for any task that involves repetition, personalisation, or learning from experience — which describes almost every business application.
Cognitive science gives us a useful taxonomy. Human memory is not one thing — it's several distinct systems that store and retrieve different kinds of information. The same framework applies cleanly to agent memory design.
Episodic memory: what happened
Episodic memory is memory of specific events: what happened, when, and in what context. For an agent, this means the record of past runs — the tasks it attempted, the outcomes it produced, the errors it encountered, and the approaches it took.
The most direct value of episodic memory is in personalisation. A customer-service agent with episodic memory can retrieve the full history of every previous interaction with a returning customer — the issue they reported last month, the resolution that worked, the preference they expressed for communication style. It responds to that customer as if it remembers them, because it does.
Episodic memory also enables failure learning. If an agent tried three approaches to a SQL optimisation problem and only the third worked, that episode can be stored and retrieved the next time a similar problem appears. The agent doesn't repeat the first two failed approaches; it starts from the third. This is a significant efficiency gain in domains where the solution space is large and trial-and-error is expensive.
Implementation typically uses a structured database — relational or document-store — where each episode is recorded as a row or document with metadata: timestamp, task type, inputs, outputs, outcome status, and any annotations. Retrieval is by query: find episodes matching this task type, or this customer ID, or this error pattern. The challenge is that as the episodic store grows, the signal-to-noise ratio of retrieval degrades. More on this in the forgetting problem section.
Semantic memory: what I know
Semantic memory is the distilled knowledge base — facts, concepts, relationships, and domain knowledge that persist independently of any specific event. For an agent, this is the repository of accumulated understanding about the domain it operates in.
The distinction from episodic memory is important. Episodic memory says "last Tuesday, the database query failed because the index was missing on the customer_id column." Semantic memory says "this schema is missing an index on customer_id — queries filtering on this column will be slow." The specific event is abstracted into a general fact.
Semantic memory is where agents build genuine expertise. A coding agent that has worked on a large codebase accumulates semantic knowledge about the architecture: which modules handle which concerns, which functions are performance-critical, which patterns the team uses consistently, which areas of the code are fragile. This knowledge isn't in the codebase documentation — it emerges from repeated interaction with the codebase and needs to be explicitly maintained in a semantic store to persist across sessions.
Implementation uses vector databases — systems that store information as numerical embeddings and retrieve it by semantic similarity rather than exact match. When an agent needs to know whether there's relevant context for the task it's about to attempt, it encodes a description of the task as a vector and retrieves the most similar items from the semantic store. The matches don't need to be textually identical to be relevant — semantic similarity captures conceptual relationships that keyword search misses.
Procedural memory: how to do things
Procedural memory is knowledge of how to perform tasks — not facts about the world, but learned action sequences and decision patterns. For an agent, this is the accumulation of effective strategies, refined workflows, and calibrated heuristics.
Procedural memory is the hardest of the three to implement well, and the most valuable when it works. Consider a research agent that has run hundreds of research tasks. Over time, it learns which search strategies work well for different query types, which sources are reliable for which domains, how to structure its output to be most useful for different downstream consumers. This is procedural knowledge — not facts to be retrieved, but patterns to be applied.
The most practical implementation stores procedural memory as structured documents: named procedures with descriptions of when to apply them, the steps involved, and the conditions under which the agent has found them effective. These are loaded into the agent's context at session start, or retrieved selectively based on task type. They function as the agent's learned playbook — refined through accumulated experience rather than written manually.
Implementation patterns
In practice, the three memory types are implemented as a layered system. The context window handles working memory — the immediate session state. External stores handle the three long-term memory types.
File-based memory
The simplest implementation. The agent reads and writes structured files — Markdown or JSON — that function as its memory store. Episodic records go in a dated log. Semantic facts go in a knowledge base. Procedural knowledge goes in a playbook file loaded at session start. This is appropriate for simple, low-volume agent deployments where the overhead of a database is not warranted and the total knowledge base fits in a manageable number of files.
# memory/semantic/codebase.md
## Architecture notes
- Auth module: handles all JWT issuance and validation
- Do not write to the users table directly — always use UserService
- The payment pipeline uses idempotency keys; always pass a UUID
## Known issues
- Bulk import endpoint times out above ~5000 records (see issue #847)
- Rate limiting not applied to internal API routes
Vector database memory
For semantic memory at scale, a vector database — Pinecone, Weaviate, Qdrant, or similar — provides the retrieval infrastructure. The agent embeds incoming information as vectors at write time and retrieves by semantic similarity at read time. This pattern scales to millions of stored items and handles the open-ended retrieval problem that keyword search cannot: "what do I know that might be relevant to this task?" is a semantic question, not a keyword question.
Conversation summaries
For episodic memory, a practical pattern is summary-on-exit: when a session ends, the agent generates a structured summary of what happened — tasks attempted, outcomes, key decisions, notable findings — and stores it in the episodic database. On the next session involving the same context (same user, same project, same task type), the agent retrieves recent summaries and loads them into context. This compresses the information density of past episodes: a 10,000-token session becomes a 200-token summary that can be retrieved cheaply.
The forgetting problem
The intuitive solution to memory is to store everything and retrieve everything. The practical problem with this approach is that it doesn't work. A memory system that stores every interaction without discrimination accumulates noise faster than signal. An agent that retrieves everything potentially relevant to a task will fill its context window with outdated, contradictory, or irrelevant information — degrading rather than improving its performance.
Forgetting is not a failure of memory systems. It's a necessary feature of effective ones. The challenge is deciding what to forget and when.
Several strategies address this. Time-based decay reduces the retrieval weight of older memories — recent episodes are more likely to be retrieved than distant ones, unless the distant ones have been explicitly marked as significant. Contradiction resolution removes or supersedes facts in the semantic store when new information contradicts them. Episodic pruning periodically removes low-value episode records — those that don't encode any learning that hasn't already been absorbed into the semantic or procedural stores. The result is a memory system that remains useful rather than growing into a liability.
Agents that improve over time
The compound effect of well-designed memory is an agent that gets better the more it runs. Early sessions are slow, uncertain, exploratory. The agent tries approaches, encounters failures, and learns. As the episodic store fills with outcomes and the semantic store absorbs distilled knowledge, subsequent sessions become faster and more accurate. The agent stops re-discovering things it already knows. It applies patterns it has validated. It avoids approaches it has found to fail.
This is the capability that most agent deployments leave on the table. It requires more engineering than a stateless agent — the memory infrastructure has to be designed and maintained — but the return compounds. An agent that runs the same class of task ten thousand times and remembers the results of all of them is not ten thousand times better than an amnesiac agent attempting the same task for the first time. But it is meaningfully, measurably better — and that gap widens with every subsequent run.
The practical implication: when evaluating an agent system, the question isn't just "does it work today?" It's "does it improve?" A system that improves is an asset that appreciates. A system that doesn't is a service cost that stays flat.
The organisations building durable advantage in agent systems are the ones building memory into the foundation — not retrofitting it as an afterthought. The patterns described here are available now. The gap between those who implement them and those who don't is already measurable, and it will grow.