Skip to content
AI Primer

BullshitBench

Measures whether models detect nonsense, call it out clearly, and avoid confidently continuing with invalid assumptions.

An AI benchmark that measures whether models detect nonsensical prompts, clearly challenge invalid premises, and avoid confidently continuing with bad assumptions; includes a public viewer and published datasets.

Recent stories

0 linked stories
No linked stories yet.
AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.