releaseMarch 22, 2026

llm-circuit-finder compares duplicated layers and reports BBH logical deduction gains

The toolkit sweeps contiguous layer ranges in GGUF and llama.cpp-style setups to test whether duplicating them can unlock better reasoning without retraining. Treat the jump as a reproducible experiment, not a settled mechanism, because thread responses challenge whether the effect reflects circuits, routing, or training artifacts.

3 min read

llm-circuit-finder compares duplicated layers and reports BBH logical deduction gains

TL;DR

The new llm-circuit-finder toolkit packages an inference-time experiment for GGUF models in llama.cpp-style setups: sweep contiguous layer ranges, duplicate selected blocks, and measure whether reasoning improves without retraining.
In the repo summary, duplicating layers 12-14 in Devstral-24B raised BBH logical deduction from 0.22 to 0.76, while duplicating layers 7-9 in Qwen2.5-32B improved reasoning by 17% repo summary.
The project is framed as a reproducible workflow, not just a claim: it includes sweep.py for circuit discovery, layer_path.py for path modification, and evaluation scripts, according to the toolkit page.
The HN discussion pushes back on the explanation, with commenters arguing the effect may reflect near-identity layers, routing or looping behavior, or training artifacts rather than a settled “reasoning circuit” mechanism.

What shipped, exactly?

Hacker News

llm-circuit-finder

262 upvotes · 81 comments

The release is a small research toolkit around inference-time model surgery. The toolkit page says it replicates Ng's RYS method by duplicating specific contiguous layer blocks during inference, working with llama.cpp and GGUF models and tested on Mistral- and Qwen-family architectures.

That makes the engineering contribution concrete: instead of retraining or merging checkpoints, the workflow searches for layer spans that are worth repeating at runtime. The bundled tools cover three steps — search, path modification, and eval — via sweep.py, layer_path.py, and evaluation scripts, as described on the repo page and summarized in the Show HN post.

How convincing are the gains?

Hacker News

Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training

262 upvotes · 81 comments

The headline result is large enough to get attention. In the HN summary, the author reports that duplicating three layers in a 24B model pushed logical deduction from 0.22 to 0.76, and the linked repo page gives a second example where duplicating layers 7-9 in Qwen2.5-32B improved reasoning by 17%.

Hacker News

Discussion around Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training

262 upvotes · 81 comments

But the mechanism is very much in dispute. One commenter says the proposed explanation “does not pass the smell test,” arguing duplicated layers may be “near-identity blocks” or may undo reasoning damage introduced during training or RLHF skeptical thread. Another suggests the pattern looks more like a routing or looping effect — “a higher-level MoE-style routing problem” with paths such as 13,13,14,14,15,15,16 — rather than evidence of a clean circuit interpretation routing interpretation. A third comparison points to Solar 10.7B depth up-scaling, where repeated layers appeared during continued training, which makes this look more like an inference-time variant of a known idea than a wholly new primitive prior-art note.