workflowMay 17, 2026

Claude Code users launch `/goal`, Obsidian, and audit playbooks to fight long-session drift

Independent builders published Claude Code memory and workflow scaffolding, including a `/goal` prompting guide, Obsidian-backed knowledge capture, and audit tooling for long-running agents. This matters because context compaction and stale session memory are becoming practical bottlenecks for multi-session coding workflows.

6 min read

Claude Code users launch `/goal`, Obsidian, and audit playbooks to fight long-session drift

TL;DR

According to the nine-part /goal template, builders are turning long-running coding sessions into explicit termination loops, with measurable end states, verification commands, and stop rules instead of open-ended prompts.
In the six-agent setup thread, the Decoding AI writeup, and OpenClaw's automation inventory, the emerging pattern is role separation: one agent writes, another tests, another reviews, and a separate loop handles CI, memory, or triage.
Memory is getting externalized into files and sidecars, from the Third Brain V5 post and session-search tooling to INTERESTS.md, because the Cursor subreddit question and Robert C. Martin's long-run experiment both describe context loss as a day-two failure mode.
Cost control is now part of the workflow design: the cache-fix Reddit post tied Claude Code burn to prompt-cache invalidation, while the 2.1.143 changelog thread added projected token estimates and a fix for /goal evaluators firing before background work had actually finished.

You can read Unsloth's Claude Code note, browse the 2.1.143 release notes, and inspect cnighswonger's cache-fix repo. There is also Paulius Ztin's full writeup, Dicklesworthstone's session-search repo, and OpenClaw's clawpatch site for the bug-finding side of the story.

/goal

The cleanest pattern in this batch is treating the prompt as a contract, not a request.

According to _avichawla's template breakdown, a usable /goal prompt has nine parts: goal, context, constraints, priority, plan, done-when, verify, output, and stop rules. The important detail is the evaluator loop: one model does the work, then a cheaper model checks the transcript for completion, so a vague end condition burns expensive turns.

That same framing shows up in the Codex chatter. yacineMTB's short rule boiled it down to "never use a codex prompt without /goal," while kilocode's reply described /goal as the difference between generating an answer and iterating until an observable objective is complete.

The feature is still half-hidden. petergostev's screenshot claimed /goal already works in the Codex app with no UI, and his follow-up said it has to be enabled in a local config file. That is exactly the kind of weird, useful detail engineers pass around before the docs catch up.

Memory files

The second pattern is pushing durable context out of the chat window and into files the agent can keep re-reading.

r/ClaudeCode

Third Brain V5 Skills

0 comments

the Third Brain V5 post pitches an Obsidian-backed loop where documents and conversations are consolidated into a persistent knowledge base. Onur's INTERESTS.md screenshot shows the same instinct in plain text form: a durable map of projects, priorities, protocols, and routing preferences that sits beside USER.md, AGENTS.md, or CLAUDE.md.

A few variants are converging:

Repo-local instruction files: mattrickard's question asks how many AGENTS.md or CLAUDE.md files belong in one repo, with guidance moving down to folder level.
Durable interest maps: INTERESTS.md stores what should be surfaced across sessions, instead of repeating it in prompts.
Cross-session retrieval: the session-search tool describes "perfect memory across all previous sessions and all agents/harnesses."
Obsidian as external memory: jxnlco's morning-prep workflow saves recurring outputs into an Obsidian vault and reviews past notes before generating the next one.

The demand signal is blunt. the Cursor subreddit post says the agent learns a codebase during a session and then resets to zero when the session ends. That complaint is now producing its own little tooling market.

Audit loops

Long sessions are also getting broken into specialized loops instead of one agent doing everything.

In pauliusztin_'s six-agent setup, the rule is simple: no agent both writes the code and decides whether it is correct. The stack splits into PM, SWE, tester, PR reviewer, on-call CI fixer, and a human-gated self-improve agent.

steipete's OpenClaw thread scales the same idea outward. His team runs roughly 100 Codex instances for PR review, issue deduping, security checks, benchmark regression checks, spam filtering, and even meeting-driven PR creation. the clawpatch screenshot is the most concrete artifact in the thread, a review report that separates one runtime bug from several test gaps and maintenance findings.

Other builders are publishing the audit scaffolding directly. the agent-automation-creator post offers a workflow-auditing framework for long-running agents, and the UB audit screenshot shows an exhaustive multi-phase Rust safety sweep with section partitioning, anchored prior findings, and follow-on static and dynamic phases.

Drift

All of this scaffolding exists because people are watching long runs get weird.

r/OpenAI

WTF is up with Codex today?

6 comments

Robert C. Martin's report is the clearest drift description in the evidence pool: after repeated context compaction, agents lose their roles, skip handoffs, invent permission requirements, and silently drop expensive tests. He still says the swarm got a lot done, but only with babysitting.

The same failure mode shows up in smaller complaints. the OpenAI subreddit thread describes Codex repeatedly touching the wrong lines and breaking unrelated code, while onusoz's complaint says Codex /goal still underperforms a queued implementation prompt on large refactors because it takes shortcuts and declares partial scope complete.

That explains why so many of these playbooks over-specify verification:

the nine-part /goal prompt requires a binary "done when" plus a concrete verify command.
the six-agent setup separates implementation from adversarial testing and diff review.
clawpatch's report labels findings by severity and type instead of letting the model summarize its own work.
the UB audit flow breaks a massive sweep into numbered phases with explicit carry-forward artifacts.

Claude Code 2.1.143

Anthropic shipped a few fixes right into the middle of these operator workarounds.

According to ClaudeCodeLog's changelog summary, Claude Code 2.1.143 added projected context cost estimates in the plugin marketplace, preserved model and effort settings when background sessions wake, and fixed /goal evaluators firing while background shells or delegated subagents were still running.

The release also tightened several background-session edges that map directly onto current complaints: the detailed changelog says /bg now preserves MCP config, settings, plugin dirs, and fallback models across respawn, while worktree cleanup no longer falls back to rm -rf when git worktree remove fails. The linked release notes and prompt-stats diff also show prompt tokens up 4,117, with tools rising from 51.0% to 54.6% of the mix.

The more pointed operational fix came from the community. the Reddit cache-fix post cites Unsloth's documentation to argue that a changing attribution header can invalidate prefix caching every turn, and points to claude-code-cache-fix as a further workaround. Even if you discount the exact savings claim, the fact pattern is clear: memory drift, token burn, and evaluation mistakes are now serious enough that users are writing infrastructure around the agent, not just prompts for it.