Skip to content
AI Primer
workflow

Researchers and builders ship external memory layers with recipe stores and 33% cheaper updates

A new MeMo paper and several community memory systems converged on keeping knowledge outside the base model through recipe files, semantic and autobiographical stores, and background reconsolidation. The pattern matters because engineers are treating context loss as a systems problem instead of only asking for larger context windows.

6 min read
Researchers and builders ship external memory layers with recipe stores and 33% cheaper updates
Researchers and builders ship external memory layers with recipe stores and 33% cheaper updates

TL;DR

You can read the MeMo paper, browse Anthropic's Claude Code memory docs, and compare that official memory surface with testingcatalog's leaked Memory Files video. The weird bit is how similar the community prototypes already look: SCPnerd is storing reconstructive recipe files, MrAddams_LibraLogic is building semantic plus autobiographical stores, and daniel_mac8 is turning an Obsidian vault into cross-project Codex memory.

MeMo turns memory into a sidecar model

The paper's core move is simple: memory lives in its own trained model, not in the base model's weights and not in an ever-growing retrieval index. As AlphaSignalAI's summary describes it, MeMo extracts facts from documents, trains a dedicated memory model on those facts, then has the frozen LLM query that model through natural-language sub-questions.

That matters because the update path changes. The MeMo paper claims new knowledge can be merged into memory without retraining from scratch, cutting update compute by 33%, while retrieval cost stays constant regardless of corpus size.

The other notable detail is interface design. Because the frozen LLM only talks to memory through natural language, AlphaSignalAI's thread says the setup can plug into closed proprietary models as well as open ones.

Recipe files store seeds, not transcripts

r/ClaudeAI

I made an entire multi-model memory system with claude, with reconstructive/condensive memories.

0 comments

Community builders are independently landing on a similar abstraction: store a compact artifact that can be expanded later, instead of trying to preserve every turn verbatim. In SCPnerd's ClaudeAI post, the unit of memory is a "recipe," a short structured object with tags, confidence, importance, pointers, and requires fields.

SCPnerd's key claim is that memory should be reconstructive. The stored object is a seed that a model can rebuild into context later, which their example says cuts token load roughly in half while producing context-adaptive output.

daniel_mac8's Codex memory thread pushes the same idea into tooling. The Obsidian vault is separate from any one repo, with AGENTS.md, project notes, people notes, and open loops acting as editable durable state instead of hidden chat history. daniel_mac8's custom instructions are explicit about the rule: write small inspectable Markdown updates, not transcripts.

That design choice keeps showing up across stacks:

  • Recipe or note files are editable by humans.
  • Durable memory lives outside the thread that created it.
  • The artifact stores distilled state, not a raw chat dump.
  • Retrieval reconstructs what matters when needed.

Graph memory starts with schema discipline

The graph-memory camp is making a different argument: the hard part is not retrieval plumbing, it is deciding what kinds of things exist and when two mentions are the same thing. pauliusztin_'s shorter post boils it down sharply, saying GraphRAG tutorials teach retrieval engineering while the real problem is ontology engineering.

In pauliusztin_'s longer thread, the recommended starting ontology is tiny: Person, Object, Location, Event, Organization. New subtypes get added only when the data exposes collisions, like "Claude Code" being extracted as a person or "agentic harness" needing to become a topic instead of a generic object.

pauliusztin_'s entity thread splits the identity problem into two stages:

  1. Resolution, which standardizes names such as "NYC" and "New York City."
  2. Deduplication, which decides whether two nodes are the same real-world entity.
  3. Permission strength follows evidence strength: weak evidence creates a node, strong evidence merges, uncertain evidence goes to review.

The operational wrinkle is cost. pauliusztin_'s checkpointing post notes that extraction, embeddings, and deduplication are expensive enough that production graph memory needs caching and stage-level retries, not monolithic replay.

A nearby open-source design in _avichawla's Graphiti thread uses three layers, episode subgraphs for timestamped raw data, semantic entity subgraphs for versioned facts, and community subgraphs for clustered summaries. The thread links to Graphiti, which makes the same bet on temporal knowledge graphs as a memory substrate.

Product memory is turning into files

The consumer-product version of this trend looks less academic and more like note management. testingcatalog's leak says Claude is preparing "Memory Files," described as organized notes Claude writes during chats and reads back when relevant, with a switch between Memory Files and Classic memory.

That description is much closer to a file-backed working notebook than to invisible personalization state. WesRoth's follow-up repeats the same framing, and WesRoth's Grok report says xAI is also working on memory summaries users can view and edit directly.

The through-line across these systems is unusually concrete:

  • Memory becomes a first-class artifact, usually a file or note.
  • The model writes to it during conversation.
  • The user can inspect or edit it later.
  • Retrieval happens when the note becomes relevant, not on every turn.

That is basically the same shape as daniel_mac8's Obsidian setup and coreyganim's shared Gbrain docs, just sliding from DIY agent harnesses into product UI.

Caching makes memory placement a cost decision

r/ClaudeCode

Cache miss in Claude Code costs 12.5× more than a hit. Here are 5 things you do mid session that quietly trigger it

0 comments

One reason this is happening now is economic, not philosophical. In lawnguyen123's Reddit post, a Claude Code user pulls two numbers from Anthropic's prompt caching docs: cache writes cost 1.25 times base input price, while cache reads cost 0.1 times base input price.

That post turns the pricing into a concrete list of cache-busting actions: adding an MCP server mid-session, switching models, editing CLAUDE.md, toggling fast mode, or pasting an image. Anthropic's own memory docs and best-practices page already push users toward stable session configuration, but the Reddit breakdown makes the systems implication obvious.

If durable state lives in editable files, sidecar memories, or graph stores, engineers do not need to keep re-paying to stuff the same long prefix into every live turn. That is a different motivation from "make the model remember more," and it helps explain why research papers, local agent builders, and product teams all keep drifting toward external memory layers at the same time.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 3 threads
Recipe files store seeds, not transcripts1 post
Graph memory starts with schema discipline3 posts
Product memory is turning into files2 posts
Share on X