Agent Psychometrics

Task-level performance prediction in agentic coding benchmarks

Open-source Python research code and data for the paper "Agent Psychometrics: Task-Level Performance Prediction in Agentic Coding Benchmarks." It implements experiments and IRT-based methods for predicting task-level success or failure of coding agents using task features, agent features, and adaptive task selection.

Recent stories

0 linked stories

No linked stories yet.