Skip to content
AI Primer
workflow

Bram Cohen compares vibe coding with AI Level 6 workflows after Claude Code leak

Bram Cohen used the Claude Code leak to argue that prompt-only development produces bad software, while a separate 250-hour syntaqlite build said the durable version arrived only after a Python-to-Rust rewrite. Practitioners say specs, tests, linters, repo skills, and codebase context are the controls that keep coding agents maintainable.

4 min read
Bram Cohen compares vibe coding with AI Level 6 workflows after Claude Code leak
Bram Cohen compares vibe coding with AI Level 6 workflows after Claude Code leak

TL;DR

  • In Bram Cohen's post and the main HN discussion, the core complaint is not that AI wrote code, but that a prompt-only operating model drifted into minimal inspection, minimal cleanup, and avoidable duplication.
  • Lalit Maganti's syntaqlite writeup describes the opposite arc: heavy delegation produced a working Python prototype, then a full rewrite in Rust produced the maintainable version after roughly 250 hours over three months.
  • According to HN commenters reacting to Bram's piece, the controls people actually trust around coding agents are specs, end-to-end tests, custom linters, and reviewable scaffolding, not raw prompting.
  • Anthropic's Claude Code skills docs show that the official product model already assumes reusable instructions and repo-local automation, while Matt Lam's skills-audit post shows users turning that into explicit workflow hygiene.
  • One Reddit team comparison argued that the missing benchmark is codebase context depth: the better tool did not use the strongest base model, it matched the team's middleware, envelopes, and logging patterns.

Bram's essay says you're still building "plan files," skills, and rules even in so-called vibe coding, which makes the anti-structure version sound more like theater than method in his post. Lalit Maganti's syntaqlite writeup is the cleaner case study: the fast AI pass produced spaghetti, the durable pass kept the idea but changed the process. Anthropic's own skills docs and public skills repo make the same point from another angle, because the product is literally built around reusable playbooks. Then a Reddit thread on enterprise context adds the useful benchmark twist: tutorial-quality code and project-quality code are different products.

Bram Cohen's dogfooding critique

Y
Hacker News

The Cult Of Vibe Coding Is Insane

605 upvotes · 502 comments

Y
Hacker News

Discussion around The cult of vibe coding is dogfooding run amok

605 upvotes · 502 comments

Bram Cohen's sharpest line is that bad software is a choice. His complaint is aimed at a style where humans keep prompting but avoid looking under the hood, even though the work still depends on human-authored plans, rules, and frameworks.

The useful distinction in the HN thread summary is operating model. One camp treats Claude as a cleanup and refactoring engine. Another adds enough process, tests, and specs that generation can be automated without turning the codebase into a haunted house.

Syntaqlite's Python-to-Rust rewrite

Y
Hacker News

Eight years of wanting, three months of building with AI - Lalit Maganti

945 upvotes · 300 comments

Y
Hacker News

Discussion around Eight years of wanting, three months of building with AI

945 upvotes · 300 comments

Maganti's build log is the better evidence than any ideology. He used Claude Code heavily, shipped a parser, formatter, PerfettoSQL support, and a playground, then reviewed the January codebase and found scattered functions, giant files, and a Python extraction pipeline he did not trust.

The rewrite kept the proof of feasibility and more than 500 generated tests from the first pass, but threw away the structure. In the second pass, he moved most of the system to Rust, took back architectural control, reviewed every change, and added linting, validation, and stronger tests in his writeup.

Specs, tests, and skills

Anthropic's Claude Code skills docs describe skills as SKILL.md playbooks that Claude can load automatically or invoke with slash commands. The bundled skills list is revealing because it includes things like /batch, which decomposes work into 5 to 30 units with parallel agents, and /simplify, which spawns review agents to find reuse and quality problems.

That lines up almost perfectly with the HN comments on AI-native engineering. The recurring controls were:

  • upfront module or class design
  • long design specs before prompting
  • e2e tests and validation
  • custom linters
  • repo-specific skills and rules

Matt Lam's skills-audit example sits at the small end of that same spectrum: a reusable repo skill for checking whether other skills are actually steering agent behavior.

Codebase context

r/ChatGPTCoding

Every ai code assistant comparison misses the actual difference that matters for teams

11 comments

The most concrete benchmark idea in this batch came from a team comparing coding assistants inside a production service. Their claim was simple: one tool wrote a clean endpoint that compiled, but used the wrong auth middleware, error handling, response envelope, and logging format, while another matched the existing stack closely enough to cut edits from roughly 15 minutes to about 3.

That thread is anecdotal, but it names a real evaluation gap. The post argues that model IQ, speed, and price miss the thing teams actually feel, whether the system writes code that belongs in this repo rather than code that would look fine in a tutorial.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

Share on X