TOOL2 stories
OpenHands
Open source software engineering agent platform.
Stories
Filter storiesNEWS1mo ago
OpenHands benchmarks EvoClaw and caps continuous-evolution scores at 38.03%
OpenHands introduced EvoClaw, a benchmark that reconstructs milestone DAGs from repo history to test continuous software evolution instead of isolated tasks. The first results show agents can clear single tasks yet still collapse under regressions and technical debt over longer runs.
WORKFLOW1mo ago
OpenHands compares 3 skill tasks and finds some reduce agent pass rates
OpenHands published a skill-eval recipe with bounded tasks, deterministic verifiers, and no-skill baselines, then showed some skills speed agents up while others make them brittle. Teams shipping skill libraries should measure them per task and model before rollout.