workflowApril 6, 2026

Bram Cohen compares vibe coding with AI Level 6 workflows after Claude Code leak

Bram Cohen used the Claude Code leak to argue that prompt-only development produces bad software, while a separate 250-hour syntaqlite build said the durable version arrived only after a Python-to-Rust rewrite. Practitioners say specs, tests, linters, repo skills, and codebase context are the controls that keep coding agents maintainable.

4 min read

Bram Cohen compares vibe coding with AI Level 6 workflows after Claude Code leak

TL;DR

In Bram Cohen's post and the main HN discussion, the core complaint is not that AI wrote code, but that a prompt-only operating model drifted into minimal inspection, minimal cleanup, and avoidable duplication.
Lalit Maganti's syntaqlite writeup describes the opposite arc: heavy delegation produced a working Python prototype, then a full rewrite in Rust produced the maintainable version after roughly 250 hours over three months.
According to HN commenters reacting to Bram's piece, the controls people actually trust around coding agents are specs, end-to-end tests, custom linters, and reviewable scaffolding, not raw prompting.
Anthropic's Claude Code skills docs show that the official product model already assumes reusable instructions and repo-local automation, while Matt Lam's skills-audit post shows users turning that into explicit workflow hygiene.
One Reddit team comparison argued that the missing benchmark is codebase context depth: the better tool did not use the strongest base model, it matched the team's middleware, envelopes, and logging patterns.

Bram's essay says you're still building "plan files," skills, and rules even in so-called vibe coding, which makes the anti-structure version sound more like theater than method in his post. Lalit Maganti's syntaqlite writeup is the cleaner case study: the fast AI pass produced spaghetti, the durable pass kept the idea but changed the process. Anthropic's own skills docs and public skills repo make the same point from another angle, because the product is literally built around reusable playbooks. Then a Reddit thread on enterprise context adds the useful benchmark twist: tutorial-quality code and project-quality code are different products.

Bram Cohen's dogfooding critique

The Cult Of Vibe Coding Is Insane

Bram Cohen criticizes 'vibe coding' at Anthropic's Claude team following a source code leak revealing poor quality. He argues that extreme dogfooding, where developers avoid inspecting or contributing to the code beyond high-level prompts, leads to bad software like duplication and inefficiencies. Cohen asserts pure vibe coding is a myth as frameworks are still needed, and AI excels when guided with audits and cleanups, but the team refuses even minimal reviews. Bad software is a deliberate choice; developers should own and improve it.

Discussion around The cult of vibe coding is dogfooding run amok

Thread discussion highlights: - saulpw on AI coding styles: Claude Code is being produced at AI Level 7 (Human specced, bots coded), whereas the author is arguing that AI Level 6 (Bots coded, human understands somewhat) yields substantially better results. - hibikir on code quality vs speed: My favorite uses of Claude code is to do code quality improvements ... Looking for repetitive patterns in unit tests/functional tests ... large chunks of duplication. - abustamam on AI-native engineering process: IME AI-native engineering requires a lot of infrastructure to make it viable ... from e2e tests ... to custom linters ... to architectural spec sheets so the LLM doesn't try to do raw D...

Bram Cohen's sharpest line is that bad software is a choice. His complaint is aimed at a style where humans keep prompting but avoid looking under the hood, even though the work still depends on human-authored plans, rules, and frameworks.

The useful distinction in the HN thread summary is operating model. One camp treats Claude as a cleanup and refactoring engine. Another adds enough process, tests, and specs that generation can be automated without turning the codebase into a haunted house.

Syntaqlite's Python-to-Rust rewrite

Eight years of wanting, three months of building with AI - Lalit Maganti

Lalit Maganti describes building syntaqlite, high-fidelity devtools for SQLite (parser, formatter, validator, LSP), after wanting them for eight years. He invested ~250 hours over three months using AI coding agents like Claude Code. Initially delegated heavily to AI, producing a functional but messy Python prototype which he discarded and rewrote in Rust for better structure. AI overcame his inertia, enabled rapid prototyping, and facilitated shipping extras like editor extensions and docs, though it required human oversight, testing, and cleanup.

Discussion around Eight years of wanting, three months of building with AI

Thread discussion highlights: - lalitmaganti on SQLite parser internals: They explain that extracting sources from SQLite meant compiling Lemon and running it against a custom `parse.y` implementation, because SQLite’s grammar actions bury how nodes like `id` and `nm` are interpreted in the source code. - FpUser on Structured prompting: This commenter says they avoid the AI failure mode by first splitting the application into modules and designing the main classes and interactions by hand, then letting AI fill in the code. - ang_cire on Planning before Claude: They argue it’s a mistake to start coding with Claude before mapping out the project in detail, and describe preparing ~3000 lines of design specs before using the agent.

Maganti's build log is the better evidence than any ideology. He used Claude Code heavily, shipped a parser, formatter, PerfettoSQL support, and a playground, then reviewed the January codebase and found scattered functions, giant files, and a Python extraction pipeline he did not trust.

The rewrite kept the proof of feasibility and more than 500 generated tests from the first pass, but threw away the structure. In the second pass, he moved most of the system to Rust, took back architectural control, reviewed every change, and added linting, validation, and stronger tests in his writeup.

Specs, tests, and skills

Anthropic's Claude Code skills docs describe skills as SKILL.md playbooks that Claude can load automatically or invoke with slash commands. The bundled skills list is revealing because it includes things like /batch, which decomposes work into 5 to 30 units with parallel agents, and /simplify, which spawns review agents to find reuse and quality problems.

That lines up almost perfectly with the HN comments on AI-native engineering. The recurring controls were:

upfront module or class design
long design specs before prompting
e2e tests and validation
custom linters
repo-specific skills and rules

Matt Lam's skills-audit example sits at the small end of that same spectrum: a reusable repo skill for checking whether other skills are actually steering agent behavior.

Codebase context

r/ChatGPTCoding

Every ai code assistant comparison misses the actual difference that matters for teams

11 comments

The most concrete benchmark idea in this batch came from a team comparing coding assistants inside a production service. Their claim was simple: one tool wrote a clean endpoint that compiled, but used the wrong auth middleware, error handling, response envelope, and logging format, while another matched the existing stack closely enough to cut edits from roughly 15 minutes to about 3.

That thread is anecdotal, but it names a real evaluation gap. The post argues that model IQ, speed, and price miss the thing teams actually feel, whether the system writes code that belongs in this repo rather than code that would look fine in a tutorial.