Skip to content
AI Primer

Agent Psychometrics

Task-level performance prediction in agentic coding benchmarks

Open-source Python research code and data for the paper "Agent Psychometrics: Task-Level Performance Prediction in Agentic Coding Benchmarks." It implements experiments and IRT-based methods for predicting task-level success or failure of coding agents using task features, agent features, and adaptive task selection.

Recent stories

0 linked stories
No linked stories yet.
AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.