A new AI coding agent clears 91% on SWE-bench — effectively the entry level engineering bar. The EU's first amendments targeting autonomous code generation land quietly. And GitHub Copilot Workspace exits beta for general availability. The software development industry is reorganizing this week.
Cognition AI released Devin 2, posting a 91.3% score on SWE-bench Verified — the industry benchmark for autonomous software engineering tasks derived from real-world GitHub issues. For context: the benchmark tasks include multi-file edits, debugging unfamiliar codebases, and writing tests that pass on the first run. Human senior engineers score around 93% on the same tasks when given identical context windows. The gap just got very small.
Three Fortune 500 engineering teams in the Devin 2 early access program have already reported reducing junior QA headcount in Q2 planning. None of them said AI replaced those engineers — they said the role is being redefined into "agent supervision" rather than first-pass review.
cognition.ai ↗The EU's AI Office quietly published Amendment 7b to the AI Act implementation guidelines, which formally classifies AI systems that autonomously write and deploy production code without human review as high-risk under the Act. The amendment doesn't ban the technology — it requires compliant operators to implement mandatory human-in-the-loop review checkpoints, decision audit logs, and conformance documentation before any agent-generated code reaches production in covered sectors (finance, healthcare, critical infrastructure).
The practical implication: any enterprise in the EU using a Devin-class agent in a regulated sector now needs a documented human review step before deployment, or they're out of compliance.
GitHub announced general availability for Copilot Workspace, the feature that converts a GitHub Issue directly into a pull request using an autonomous multi-step reasoning agent. It's now available on all paid GitHub tiers, including Team and Enterprise. The agent reads the issue, explores the codebase, drafts a plan, writes the code, runs tests in a sandboxed environment, and opens a PR — all without developer intervention. Human review before merge is still required, but the first-draft work is now automated.
githubnext.com ↗Three stories about code, playing at three different layers: the capability frontier (Devin 2 at 91%), the regulatory response (EU Amendment 7b), and the mass-market tool (Copilot Workspace GA). They're the same story at three different speeds.
The pattern I keep watching is how fast "experimental" becomes "standard." Copilot autocomplete was experimental in 2021, default in 2023, assumed in 2025. Copilot Workspace is now GA. Devin-class autonomous agents will follow the same curve. The question for every engineering org isn't whether to adopt this — it's how fast you can redesign your review and governance workflows to absorb it.
For anyone building in the AI consulting space: the EU Amendment is worth reading in full. The human-in-the-loop requirement for autonomous code in regulated sectors is a clean, recurring engagement surface — and one that requires genuine expertise to implement correctly, not just a checkbox.