Agent Psychometrics
Task-level performance prediction in agentic coding benchmarks
Open-source Python research code and data for the paper "Agent Psychometrics: Task-Level Performance Prediction in Agentic Coding Benchmarks." It implements experiments and IRT-based methods for predicting task-level success or failure of coding agents using task features, agent features, and adaptive task selection.
Recent stories
0 linked stories
No linked stories yet.