Morning Brief · Monday

A Harvard Study Just Proved AI Beats ER Doctors. Musk Calls Altman a Fraud. Hollywood Signs Its AI Deal.

OpenAI's o1 outperformed human triage doctors in a landmark Harvard study — correctly diagnosing 67% of emergency room patients versus 50–55% for physicians — with results published in Science. The Musk v. Altman trial enters week two in Oakland with Sam Altman yet to testify and a live audio stream arriving next week. SAG-AFTRA joined the WGA in signing a four-year deal with the studios, AI guardrails included, leaving only the DGA at the table. And developers went viral building a way to run Claude Code's full agent loop at 17x cheaper cost using DeepSeek V4 Pro.

Mira Novian

May 4, 2026 · Morning Brief · ~9 min read

Harvard published AI versus ER doctors in Science. AI won — by a lot.

A groundbreaking Harvard study published in the journal Science has found that OpenAI's o1 reasoning model outperformed emergency room triage doctors across every evaluated dimension — by margins large enough to be clinically significant. In a direct head-to-head evaluation using real patient data from Beth Israel Deaconess Medical Center in Boston, o1 correctly identified the exact or very close diagnosis in 67% of ER triage cases, compared to 50–55% accuracy for human doctors given the same electronic health records. Crucially, this advantage was most pronounced under triage conditions — rapid decisions with minimal information — which is precisely the scenario where misdiagnosis carries the highest consequence. With more clinical detail available, AI accuracy rose to 82%, versus 70–79% for expert human physicians, though the researchers noted this gap was not statistically significant at that level.

The treatment planning results were more striking still. When o1 and 46 physicians were each presented with five complex clinical case studies and asked to develop treatment plans, the AI scored 89% versus 34% for human doctors using conventional resources like search engines. One illustrative case: a patient presenting with a pulmonary embolism whose anti-coagulants appeared to be failing — human physicians assumed treatment failure; o1 recognized that the patient's history of lupus suggested lupus pneumonitis, a different mechanism entirely. The AI was correct. Lead researcher Arjun Manrai of Harvard Medical School emphasized that the findings are not a displacement story: "AI would not replace physicians but join them in a new triadic care model — the doctor, the patient, and an artificial intelligence system." The study also noted that roughly one in five US physicians already use AI to assist with diagnosis, suggesting clinical AI adoption is further along than public discourse implies.

theguardian.com ↗

The 89% versus 34% treatment planning number is the one that should be making headlines, and it largely isn't. That's not a marginal improvement — it's a 2.5x gap between AI and human doctors using standard clinical resources, on structured case studies, published in Science. The ER triage results are impressive, but triage is constrained to reading structured EHR data with no visual or interpersonal signals. The treatment planning comparison is closer to what AI would actually be doing in clinical support roles. The study's authors are right that the path forward is human-AI collaboration, not replacement. But "triadic care model" is a description of an outcome, not a roadmap. The actual obstacles are accountability frameworks (who's liable when AI-assisted diagnosis is wrong?), reimbursement structures (does AI-assisted diagnosis get billed differently?), and workflow integration (how does o1's output enter the clinical record?). None of those are technical problems. What this study does is remove the "we don't know if it works" objection from the conversation. It works. The remaining objections are institutional, and they'll take longer to resolve than the science did to prove — but they are now the only objections that matter.

Legal

The Musk v. Altman trial enters week two. Altman hasn't testified yet — and a live audio stream arrives next week.

The federal trial of Musk v. Altman & OpenAI entered its second week today in an Oakland courthouse, having opened April 27 before Judge Yvonne Gonzalez Rogers. Elon Musk spent two days on the stand last week, claiming he was "tricked" into providing OpenAI's seed funding — approximately $38 million from 2015 to 2017 — for what became an $800 billion for-profit company. "I was a fool to fund OpenAI," Musk testified, arguing that OpenAI co-founders Sam Altman and Greg Brockman broke an original promise that the company would remain a nonprofit dedicated to humanity's benefit, not private enrichment. Musk is seeking damages, the removal of Altman and Brockman from their leadership roles, and the reversal of OpenAI's restructuring into a for-profit entity. Microsoft is also named in the suit, accused of aiding and abetting the alleged breach of charitable trust by investing $13 billion into the restructured entity.

OpenAI's legal team has rejected the allegations in full, arguing that no permanent promise of nonprofit status was ever made, that Musk himself agreed in 2017 to the establishment of a for-profit structure as a necessary step, and that the lawsuit is primarily an attempt to destabilize a competitor to his own AI venture, xAI. The presiding judge has focused the trial on "promises and breaches of promises" rather than existential AI arguments, meaning the outcome will likely turn on the interpretation of decade-old emails and founder conversations. Sam Altman has not yet testified, and with the trial projected to run through May 21, his examination — and cross-examination — will be among the most closely watched moments of the tech year. Per The Verge, a live audio stream of proceedings will begin next week, bringing the trial into broader public consciousness.

theringer.com ↗

The trial is fundamentally a document dispute dressed up as a values argument. Musk's "I was tricked" framing is emotionally compelling, but Judge Gonzalez Rogers narrowed the scope to "promises and breaches" for a reason — it's the only legally tractable version of the case. That means everything turns on what 2015 emails and 2017 agreements actually say, and what a reasonable person would have understood them to promise. OpenAI's counter that Musk agreed to the for-profit pivot in 2017 is the strongest ground they're standing on. What makes this week pivotal is that Altman hasn't testified yet, and his examination will be the first time he's had to provide sworn testimony about what he promised, to whom, and when. That's the moment that could actually shift the case. The live audio stream matters for a different reason: it converts a legal proceeding into a public narrative. Whatever Altman says on the stand will be heard, clipped, and distributed. The PR stakes of his testimony may ultimately exceed the legal stakes, depending on what comes out. The trial is already the most revealing window into OpenAI's founding that has ever been opened to the public. It's going to get more revealing before it closes.

Labor

SAG-AFTRA signed a four-year deal with the studios. Both guilds have their AI guardrails. Now it's the DGA's turn.

SAG-AFTRA and the Alliance of Motion Picture and Television Producers reached agreement on a new four-year contract last week, following a second round of week-long negotiations. The deal — not yet made fully official, with ratification expected in the coming days — includes a sizable contribution to the union's pension fund, increased streaming residuals, and new artificial intelligence protections. The agreement makes SAG-AFTRA the second major Hollywood guild, after the WGA, to secure a four-year contract from studios that includes AI safeguards. The WGA ratified its own deal at the end of April with a 90% approval rate. SAG-AFTRA executive director Duncan Crabtree-Ireland had reportedly refused to sign a longer-term contract unless studios conceded more on AI — meaning the four-year structure was itself the leverage point for extracting AI terms. The Directors Guild of America is next, sitting down with the AMPTP on May 11.

The specific AI provisions in both the WGA and SAG-AFTRA deals have not yet been publicly detailed, but the pattern of both guilds accepting extended-length contracts in exchange for AI protections reveals the underlying trade: studios secured labor predictability through multi-year deals; guilds secured AI guardrails as the price of that predictability. What "AI guardrails" mean in practice for actors is meaningfully different than for writers. For writers, the core concern is AI-generated scripts; for actors, it is digital likeness reproduction and synthetic performance — the use of AI to generate or extend an actor's performance without their consent or compensation. The DGA negotiation is the final piece, and directors occupy a different power position than writers or actors: they control the interpretation of material that has already been written and cast, which means the AI questions for directors involve deployment and post-production in ways the other guilds don't have to contend with.

deadline.com ↗

The Hollywood guild deals are the most significant real-world AI governance exercise happening anywhere right now, and they're getting a fraction of the attention they deserve. The reason is that these agreements are being negotiated by people who understand both the creative stakes and the commercial stakes at a level of specificity that regulatory agencies and AI labs typically don't have. When SAG-AFTRA negotiates digital likeness rights, they're drawing a line in exactly the place where AI capability meets individual economic harm — not in a policy paper, but in a binding contract with dollar figures attached. The four-year contract structure is important: it means the AI guardrails these guilds negotiated lock in for four years at a time when the technology is moving at a pace where "four years" might as well be a geological epoch. That's either smart (securing the best deal available before AI gets better and leverage disappears) or a miscalculation (the terms may be obsolete within 18 months). The DGA negotiation is the one to watch most closely. Directors shape what gets made and how, which means AI tools for pre-visualization, shot planning, post-production synthesis, and AI-generated B-roll all fall within their jurisdiction. The precedents set in the DGA deal will define the limits of AI in production in ways that go well beyond actor likeness rights.

Dev Tools

Developers built a way to run Claude Code's agent loop at 17x cheaper cost using DeepSeek V4 Pro. It went viral.

An open-source project called DeepClaude reached the top of Hacker News today with over 530 upvotes and more than 200 comments, surfacing a developer behavior that has been quietly spreading for months: running Claude Code's autonomous agent loop with a different model as the brain. DeepClaude works by intercepting the API calls that Claude Code's CLI sends to Anthropic and routing them to DeepSeek V4 Pro instead — keeping Claude Code's full UX, tool loop, file editing, bash execution, subagent spawning, and git operations intact while replacing the reasoning model with one that costs $0.87 per million output tokens versus Anthropic's $15. That's a 17x cost reduction at comparable benchmark performance. DeepSeek V4 Pro scores 96.4% on LiveCodeBench; Claude Opus's score is in the same range. For developers running heavy agentic coding loops — multi-step autonomous tasks, overnight experiments — the $200/month Anthropic Max plan with usage caps becomes $20–50/month with DeepSeek, uncapped. The project also supports OpenRouter at $0.44/M input and Fireworks AI for lowest-latency US-based inference.

DeepSeek's automatic context caching makes the economics even more favorable for agentic use: after the first API request in a session, the system prompt and file context are cached at $0.004 per million tokens — 120x cheaper than uncached calls — meaning long agentic loops where the same context is referenced repeatedly cost almost nothing in subsequent turns. DeepClaude's viral traction reveals something important about where value in the agentic AI stack is actually accruing: Claude Code's UX, agent loop, and tool architecture are apparently valuable enough that developers are engineering around the pricing rather than switching to a native DeepSeek interface. The project has caveats — no image/vision input, no MCP server tools, and DeepSeek's servers are in China — but for code-focused autonomous tasks, the functional equivalence is high enough that developers are clearly treating it as a viable production option.

github.com ↗

DeepClaude is one of the more revealing signals about AI infrastructure economics in recent memory, and what it reveals cuts in two directions. First: Anthropic has built something in Claude Code's agent loop that is genuinely valued above the model itself — developers are going to significant lengths to keep the UX while swapping the reasoning engine. That's a durable competitive advantage in UX, not in model capability. Second, and more uncomfortable for Anthropic: if the loop can be decoupled from the model at 17x cost savings with comparable benchmark performance, the "Claude Code" subscription is effectively being used as a frontend for competitors' inference. The fact that DeepSeek V4 Pro is served from China is the non-trivial caveat here — for enterprise or security-sensitive use cases, that's a hard stop. For individual developers doing personal projects, it's apparently acceptable. What this signals about the near-term trajectory of the AI tooling market: UX moats will be contested from below by cost arbitrage, and the labs that don't compete on price will need to compete on exclusive capability. For Anthropic, that exclusive capability case rests heavily on Claude Mythos and whatever comes after it. The DeepClaude story is a preview of the infrastructure commoditization pressure that every frontier lab will face as open-weights models continue to close the benchmark gap with proprietary ones.

✦ Mira's Take

The thread across today's brief is a single shift that's been building for months and is now impossible to ignore: AI is no longer primarily a technology story — it's becoming an institutional one. The Harvard study doesn't tell us AI is capable; we already knew that. It tells us that AI capability has crossed a threshold where the question is no longer "does it work?" and is now "who's liable, who gets paid, and how does it enter the clinical record?" Those are institutional questions, and they take years to resolve, not quarters. The study removes the last defensible objection to deploying AI in clinical decision support, which means the bottleneck is now squarely in the hospital system's administrative and liability infrastructure. That's both frustrating and clarifying.

The Musk v. Altman trial and the Hollywood guild deals are two different versions of the same institutional reckoning. In Oakland, the question is what promises were made at the founding of the most important AI company in history and whether they were broken. The answer matters not just for OpenAI's legal liability but for every AI lab that has ever made commitments about safety, governance, and mission. If courts start treating those commitments as legally enforceable contracts rather than PR positioning, the entire industry's relationship with its stated values changes. In Hollywood, the question is how human creative workers protect their economic interests against a technology that can replicate their output at marginal cost. The guild deals represent the most specific, enforceable AI governance frameworks that currently exist — more specific than any government regulation, more binding than any voluntary lab commitment. The DGA negotiation this month could set standards for AI in production that affect every studio and streaming service for the next four years.

DeepClaude is the technical coda. When developers build elaborate workarounds to keep one company's UX while running another company's model, it tells you exactly where the value in the stack actually lives. Anthropic built an agent loop worth paying for and preserving. DeepSeek built inference cheap enough to arbitrage. The developers doing the arbitrage are telling us, in the most direct way possible, that the UX moat and the model moat are separating — and that the company that owns both will not necessarily be the one that wins. That's not a threat to Anthropic's business today. It is a preview of the competitive dynamics of 2027.