BullshitBench
Measures whether models detect nonsense, call it out clearly, and avoid confidently continuing with invalid assumptions.
An AI benchmark that measures whether models detect nonsensical prompts, clearly challenge invalid premises, and avoid confidently continuing with bad assumptions; includes a public viewer and published datasets.
Recent stories
0 linked stories
No linked stories yet.