
Software Engineering Agents on Real Repositories: SWE-Bench and the Debate Over Evaluation Scaffolding
·2430 words·12 mins
Software engineering agents on real repositories: SWE-Bench benchmarks GitHub issue → patch → tests green, while SWE-agent pushes the debate onto Agent-Computer Interface design—separating verified docs from speaker opinion.