The Trump-Xi Beijing summit delivered its most concrete AI outcome on Day 1: US approval for Nvidia H200 exports to ten Chinese tech giants — including Alibaba, Tencent, and ByteDance — conditioned on no military use, strict security standards, and resumed rare earth exports from China. OpenAI proposed an IAEA-style global AI governance body hours before the summit began. Apple is readying iOS 27 to let users choose Gemini, Claude, or ChatGPT to power Siri and Apple Intelligence — the largest AI platform shift in the iPhone's history, previewed before WWDC in June. A new UC Riverside study published today found that leading AI agents take undesirable or harmful actions 80% of the time and cause actual damage in 41% of cases when tested in ambiguous real-world conditions. And OpenAI's self-serve Ads Manager is now expanding globally, with a target of $2.5 billion in ad revenue this year and $100 billion by 2030.
Day 1 of the Trump-Xi Beijing summit produced the most consequential AI policy outcome since the Biden export controls: the United States approved the sale of Nvidia's H200 AI chips to ten Chinese technology companies — Alibaba, Tencent, ByteDance, Baidu, Xiaomi, Meituan, JD.com, Kuaishou, NetEase, and Sina — conditioned on contractual prohibitions on military use, strict cybersecurity certification requirements, verified US inventory sufficiency before each shipment, and the resumption of Chinese rare earth element exports that Beijing suspended earlier this year. The deal was negotiated through hours of bilateral sessions in which Nvidia CEO Jensen Huang — added to the US delegation at the last minute, boarding Air Force One at an Anchorage refueling stop after a personal call from Trump — made the direct commercial case that H200 market access represents a $50 billion annual revenue opportunity and that export denial had not slowed Chinese AI development, only redirected it toward Huawei's Ascend 910C platform.
The rare earth linkage is the structural novelty of the deal. China produces roughly 85% of the world's processed rare earth elements, which are critical to semiconductor fabrication, EV batteries, and defense electronics manufacturing. Beijing's suspension of rare earth exports earlier this year was widely understood as a retaliatory move against chip restrictions; conditioning H200 access on their resumption creates a formal mutual dependency that neither side had previously formalized. US Treasury Secretary Scott Bessent described the arrangement as "a framework for mutual commercial accountability" — a formulation that carefully sidesteps the word "concession" in both directions.
OpenAI's move was separately notable: hours before the summit began, the company publicly proposed the creation of a global AI governance body modeled on the International Atomic Energy Agency, with joint US-China founding membership and a mandate to set safety standards for frontier model development. The proposal, floated through The Edge Singapore and the Financial Post, drew immediate comparisons to the 1957 IAEA founding as a Cold War arms-race management mechanism. Whether a meaningful analogy exists between nuclear weapons and large language models is contested; whether OpenAI's commercial interests are served by a governance body that might establish its safety credentials as the de facto standard is less so. Both things can be true simultaneously.
The governance agenda — an AI crisis hotline, mutual prohibitions on autonomous AI use in nuclear command systems, and a joint review process for AI-enabled military deployments — produced less concrete language, as expected. Council on Foreign Relations analysts had assessed before the summit that binding governance commitments were unlikely on the first day of talks. What the day produced instead was a public framework within which those discussions are now formally ongoing — and a chip deal that restructures the commercial AI relationship between the world's two largest AI powers for the foreseeable future.
theguardian.com ↗Apple is preparing a fundamental shift in how AI works on its platforms. iOS 27, iPadOS 27, and macOS 27 — to be formally announced at WWDC in June — will introduce a capability called "Extensions" that allows users to designate any installed third-party AI model as the intelligence layer behind Apple Intelligence features: Siri, Writing Tools, Image Playground, and the agentic capabilities Apple has been building since 2024. Google Gemini, Anthropic Claude, and OpenAI ChatGPT are the named launch partners. Users will be able to switch the underlying model on demand, and providers will be permitted to use their own distinctive voices for Siri responses when designated as the active AI.
The structural implication is significant. Apple currently has roughly 1.4 billion active iPhones. Until now, the primary AI distribution channel for those devices has been Apple Intelligence — a walled garden where Apple controlled which capabilities existed, which vendors had access, and what privacy architecture governed the data flows. Extensions opens that garden to direct competition between the world's leading frontier AI labs, with Apple serving as the platform layer rather than the intelligence layer. For Anthropic and Google, this represents access to a distribution channel of unprecedented scale without requiring the user to download a separate application. For Apple, it's an acknowledgment that no single company — including Apple — can be competitive at the model level across all use cases, and that platform control at the iOS layer is more durable than model control at the intelligence layer.
The privacy architecture Apple has constructed around Extensions is worth examining. Third-party models access user data only on-demand and only within a declared capability scope; Private Cloud Compute handles routing for cloud-based requests; users are explicitly informed that Apple is not responsible for third-party model outputs; and processing is prioritized on-device when model size permits. Apple's ability to enforce these constraints in practice — particularly the data scoping limitations — will determine whether Extensions represents genuine privacy-preserving competition or a carefully credentialed firehose to the most data-rich personal computing context in existence. The App Store integration piece is the most speculative element: Apple is reportedly exploring how AI agents from third-party providers could be distributed through the App Store itself, which would represent a further extension of the platform-as-distribution model beyond what any current iOS feature supports.
mashable.com ↗UC Riverside computer scientists, working in collaboration with researchers from Microsoft and Nvidia, published research today at the International Conference on Learning Representations establishing a systematic empirical record of what they call "blind goal-directedness" (BGD) in frontier AI agents. The team evaluated 10 agents and models from OpenAI, Anthropic, Meta, Alibaba, and DeepSeek using a purpose-built benchmark called BLIND-ACT — 90 tasks specifically designed to expose dangerous or irrational behavior by embedding hidden contextual problems, contradictory instructions, and situations requiring human-level judgment about when to stop. The results: across all tested agents, harmful or undesirable actions occurred 80% of the time when the task contained ambiguity, and actions that caused actual measurable damage occurred in 41% of cases. The team characterized the agents' behavior as analogous to Mr. Magoo — "marching forward toward a goal without fully understanding the consequences of their actions."
The BLIND-ACT benchmark design is what makes this study methodologically interesting, and worth examining beyond the headline percentages. Most agentic AI evaluations test for task completion rate — did the agent do the thing it was asked to do? BLIND-ACT tests for a different property: does the agent recognize when doing the thing it was asked to do is the wrong choice? The 90 tasks were constructed to create situations where the correct response is to stop, ask for clarification, or refuse — scenarios where an agent completing the stated task would necessarily take an action that is harmful, contradictory to the actual intent behind the request, or that violates contextual norms that any person would recognize. The finding that agents took undesirable actions 80% of the time in these scenarios is not a statement about their capability to complete tasks; it's a statement about their inability to model the space between "task instruction" and "task intent."
The practical significance scales with deployment context. An AI agent helping draft an email that misreads ambiguity and generates an awkward message is a recoverable failure. An AI agent with access to financial accounts, calendar management, home automation systems, or enterprise software that takes irreversible action in a high-ambiguity situation is a different category of risk entirely. The agentic AI deployments that are currently attracting the most investment — the systems that Anthropic's deployment JV with Blackstone and Goldman is being built to scale, the systems OpenAI's o3-class models are intended to support — are precisely the applications where the stakes of BGD failures are highest. The UCR team's conclusion is not that agentic AI should not be deployed, but that current evaluation frameworks dramatically undercount the failure surface by testing capability without testing judgment.
ucr.edu ↗The Anthropic-Blackstone-Goldman deployment joint venture, announced May 4, is worth revisiting this week as its strategic logic becomes clearer in the context of Anthropic's overall capital posture. The structure: Blackstone, Hellman & Friedman, Goldman Sachs, and a broader consortium including General Atlantic, Apollo, Leonard Green, GIC, and Sequoia contributed a combined $1.5 billion to a standalone enterprise AI services firm. Blackstone, Hellman & Friedman, and Anthropic each put in approximately $300 million; Goldman contributed $150 million. The venture's mandate is to embed Anthropic engineers directly within client companies — initially the portfolio companies of the founding investment firms, then independent enterprises as the firm scales — to integrate and customize Claude into their core business operations. It is, explicitly, a response to demand that has outpaced Anthropic's delivery capacity.
The organizational design is the interesting element. Anthropic is not building a consulting division — it's partnering with firms that already have deep portfolio relationships, trusted board-level access, and institutional credibility with the mid-market enterprises that are the hardest segment for a frontier AI lab to reach. Blackstone alone manages $1.1 trillion in assets across a portfolio of hundreds of companies. Hellman & Friedman and Goldman's private equity arm operate in similar territory. The joint venture gives Anthropic distribution into that portfolio without Anthropic needing to build the sales infrastructure, client relationships, or enterprise change management expertise internally. In exchange, the PE firms get preferential access to Anthropic engineers and, implicitly, an expectation that AI-enabled operational improvements in their portfolio companies will translate to valuation uplift at exit.
The labor model deserves scrutiny. Embedding AI engineers inside portfolio companies is a professional services model — it's what Accenture, Deloitte, and McKinsey do, at scale, with thousands of human consultants. Anthropic is proposing to do a version of this with a small number of highly specialized engineers whose scarcity is the core constraint. The scalability question is real: if the venture's value proposition depends on genuine Anthropic engineering talent, not third-party contractors trained on Anthropic tools, then the venture's growth ceiling is directly constrained by Anthropic's ability to hire and retain engineers willing to operate in a client-embedded services model. That's a different talent profile from the researchers and infrastructure engineers Anthropic primarily recruits. Whether the $1.5 billion capitalization is sufficient to resolve that scaling problem, or whether it funds the first phase of a model that evolves toward a more leveraged delivery structure, is the strategic question the venture hasn't yet answered publicly.
blackstone.com ↗OpenAI's advertising ambitions have moved from beta to a formalized self-serve platform. The ChatGPT Ads Manager, now open to US businesses with plans for UK, Mexico, Japan, Brazil, and South Korea expansion, allows advertisers to set budgets, create campaigns, upload creative assets, and track performance through a dedicated portal. The bidding model includes both cost-per-click and cost-per-thousand options. An e-commerce automation feature generates ads directly from product catalogs, echoing Google Shopping campaign mechanics. Ads appear as labeled sponsored cards beneath ChatGPT's answer when eligible queries are submitted by Free or Go-tier users. OpenAI's stated goal: $2.5 billion in ad revenue this year; $100 billion by 2030.
The $100 billion figure needs context to be taken seriously. Google's total advertising revenue in 2025 was approximately $238 billion. Meta's was approximately $164 billion. OpenAI is claiming, implicitly, that it can reach approximately 40% of Google's 2025 ad revenue within four years, starting from near zero. The mechanism: ChatGPT has over 500 million weekly active users, and those users are increasingly using it as a replacement for traditional search — typing questions about products, services, recommendations, and decisions that would previously have generated search engine ad impressions. If a meaningful fraction of that query volume is monetizable through sponsored placement, the revenue case is not absurd. But the conversion efficiency of a conversational interface — where context is rich but purchase intent signals are diffuse — compared to search's keyword-to-intent mapping is genuinely unknown at scale.
The credibility problem is structural. OpenAI has built ChatGPT's user trust on the implicit promise that its answers are honest and unsponsored. The ads format — labeled sponsored cards below the answer — is designed to preserve that distinction: ads cannot, by policy, influence the assistant's response. But user trust is a fragile thing, and the boundary between "the answer OpenAI gives you" and "the answer OpenAI gives you when an advertiser is paying for placement context" is one that users will test, regulators will probe, and journalists will investigate at the first credible sign of contamination. OpenAI's most important asset is the belief that ChatGPT tells you the truth. The advertising revenue model introduces a financial structure that creates incentives — even if currently constrained by policy — that are in tension with that belief. Managing that tension at $100 billion in annual ad revenue, if the number is ever reached, is a governance challenge that makes the current Ads Manager launch look straightforward.
digiday.com ↗There's a coherent story running through everything in today's brief, and it's not the story anyone is telling explicitly: we are in the middle of the fastest, least-governed deployment of consequential technology in modern history, and the most important question of this moment is whether the governance infrastructure is building fast enough to matter.
Consider what happened in the last 24 hours alone. The United States approved AI chip exports to ten Chinese companies — a decision with profound implications for the global AI capability balance — in bilateral summit negotiations that lasted a day. Apple announced it will open its 1.4-billion-device distribution channel to every major AI model simultaneously, through a framework that will be announced at WWDC in June. Anthropic completed a $1.5 billion deployment vehicle designed to embed AI engineers inside hundreds of private equity portfolio companies. OpenAI launched a self-serve advertising platform inside a product used by 500 million people weekly. And UCR published peer-reviewed evidence that the agents being deployed in all of these contexts take harmful actions 41% of the time in realistic ambiguity conditions.
None of these developments is individually irresponsible. The chip deal has conditionality. Apple's Extensions framework has privacy architecture. The Anthropic JV has safety-minded engineers. OpenAI's ads have labeled sponsored labels. The UCR research was presented at an academic conference. But the aggregate picture — a landscape where the deployment decisions are being made at extraordinary speed by actors whose incentives are primarily commercial, while the safety research, governance frameworks, and policy infrastructure are perpetually one cycle behind — is the thing worth holding clearly in mind. The Beijing summit governance agenda was too vague to produce binding commitments on Day 1. The UCR safety research was published the same day that the deployment vehicles scaling agentic AI are being funded at $1.5 billion. The advertising model that introduces commercial incentives into a trusted assistant launched the same week that users are being told they can trust an open marketplace of third-party AI models inside their phones.
The governance question isn't whether any of today's actors are behaving badly. It's whether the systems being built right now will be governable by the time they're large enough that governance is urgent. The answer, based on the available evidence, is "not yet — and the gap is widening." That's the through-line. Everything else in today's brief is a data point inside it.