Open-source software engineering agent
AI software engineering product for code generation, debugging, and repository-level development workflows.
DeepSWE launched a coding benchmark built from 113 original tasks across 91 repos and five languages, with GPT-5.5 leading at 70%. The setup is meant to better reflect repo search, multi-file edits, and verification in real agent workflows.