After Google I/O, Publishers Are Out of Time. But There Is a Roadmap.

The pay-per-crawl web already has its toll collectors. Publishers don’t yet have a broker — and the window to build one is closing.

May 25, 2026

(credit: Google)

The week the architecture became shockingly visible

Last week at Google I/O 2026, Head of Search Elizabeth Reid announced the most significant redesign of Google’s search box in twenty-five years. The query field now expands to accept longer conversational prompts. AI Mode, already at over a billion monthly users, becomes more deeply integrated into the default search surface. Information agents will monitor topics over time and push updates to users. Autonomous workflows can be initiated from the same box where users used to type three-word queries. Reid’s framing was unambiguous: “AI search through and through.”

For the publishing industry and for analysts that have been along for the ride, the I/O announcements weren’t a surprise so much as a confirmation. Zero-click searches now account for roughly 60 percent of queries. AI Mode itself shows a 93 percent zero-click rate. Pew Research found click rates with AI summaries drop to 8 percent from 15 percent without — a 46.7 percent relative decline measured against a controlled baseline. Individual publishers report damage from severe to, dare we say, catastrophic. HubSpot estimates a 70 to 80 percent collapse in organic traffic. DMG Media documented drops of up to 89 percent for affected queries. NPR called it “an extinction-level event” for online news publishers. You get the idea.

In brutal terms, this is the demand-side harm — what users see, what gets quantified in click-through rates, what regulators and antitrust authorities focus on. It is real and substantial. But it is also, I want to argue, only one of three layers of enclosure currently being built around the open web, and not the only one where durable structural shift is happening.

A week before I/O, on May 6, Thomson Reuters CEO Steve Hasker articulated his AI licensing framework publicly: archive-only scope, maximum pricing, short contract terms designed to force renegotiation. Reuters has commercial agreements with major AI companies. Reuters also, Hasker confirmed, sees its breaking news content surfaced in those same AI products without compensation. The chief executive of one of the world’s largest news organizations was publicly acknowledging that the deals don’t cover the actual extraction happening.

If even Reuters cannot stop live content extraction with the leverage it has, the deals are not the architecture. They are the visible part of an architecture whose less visible parts are doing more work. The tip of the iceberg, let’s say.

Three months earlier, in February, Microsoft launched its Publisher Content Marketplace, a structured platform where publishers can license their work to Microsoft AI products under usage-based reporting terms. In April, Cloudflare expanded the x402 Foundation — now jointly governed with Google, Microsoft, Amazon, Coinbase, and the major payment networks — to handle machine-to-machine billing at protocol level. Both are brokerage architectures. Both are owned and governed by the companies whose pricing power publishers are trying to escape.

And beneath both, a third layer: a roughly $1 billion market for on-demand publisher content operated by venture-backed scraper firms — Firecrawl, Exa, Tavily, Bright Data, Apify, Zyte, and at least eighteen others — selling extraction-as-a-service to AI labs, consultancies and enterprises with no licensing conversation at any point in the pipeline. According to industry analyst Matthew Scott Goldstein, of twenty-one such vendors profiled in his recent report, zero had agreements with the publishers whose content they re-sell.

The publishing industry’s response to all of this remains, for the most part, individual. A handful of elite players negotiate bilateral deals. SPUR coordinates among five major UK publishers. CoMP is a technical protocol from the IAB Tech Lab. None of these, on its own, is the missing piece.

What’s missing is a publisher-side broker.

Three layers, one architecture

The dominant frame for the publisher crisis treats it as a story about Google, or about AI Overviews, or about the failure of bilateral licensing. The frame is too narrow. The architecture that’s being built has three layers, and each layer extracts value differently from publishers who lack the leverage to push back.

[Figure taken from my paper: “The New AI Scraper Economy and the Three-Layer Enclosure of the Open Web” (April 2026)]

Layer 1: The hyperscalers. Amazon Web Services, Microsoft Azure, and Google Cloud have built full-stack AI offerings that integrate infrastructure, foundation models, and application-level tooling. Their convening power over AI is structural — model developers, enterprise customers, and the industrialization of AI across sectors all depend on cloud infrastructure controlled by three firms. This is the layer where antitrust attention has concentrated. It is also the layer where the most visible licensing deals are happening: News Corp and OpenAI ($250M+ over five years), Reuters and various AI companies, AP and Google, NYT and Amazon (reported at $20–25M annually). For publishers with the scale and legal apparatus to negotiate at this layer, real revenue is flowing. For everyone else, this layer is structurally inaccessible.

Layer 2: The content-delivery tier. This is the layer that requires its own theorization, and the one this piece is centering. Cloudflare handles traffic for roughly 20 percent of the web by its own account. Since mid-2024, it has constructed a parallel apparatus of marketplace intermediation, automated payment, and ecosystem cultivation that mirrors the hyperscaler playbook on a smaller footprint but in a strategically decisive position. The chronology is diagnostic.

In July 2024, Cloudflare launched a one-click AI bot blocker. In September and December, it added granular per-crawler controls. On July 1, 2025 — branded “Content Independence Day” — it made blocking the default for new customers and introduced Pay Per Crawl, reviving HTTP 402 (a payment-required status code defined in the original HTTP/1.1 specification but unimplemented for three decades) as the channel through which AI crawlers could obtain paid access. Between July and early December 2025, Cloudflare logged 416 billion blocked AI bot requests. In March 2026, Cloudflare released its own Crawl API — a tool that can scrape an entire website with a single request — and customers initially found they could not block it through Cloudflare’s own managed controls.

The protector had become, in infrastructural terms, a toll collector on both sides of the road.

Two weeks before the first version of this paper was written, Google, Microsoft, and Amazon formally joined the x402 Foundation alongside Cloudflare, Coinbase, and the major payment networks. What had been presented to publishers as Cloudflare’s independent defense of the open web had resolved into co-governance with the hyperscalers of a jointly-owned payment architecture for machine-mediated content access.

Layer 3: The scraper economy. This is the least-discussed layer and the one with the most aggressive bypass logic. The firms in this category — over twenty in Goldstein’s report, with the content-licensing tracker TollBit identifying approximately forty in total — have built a $1 billion-plus market for on-demand publisher content, projected to double to $2 billion by 2030. Their customers, documented in Goldstein’s research, include BCG, IBM, Cohere, AWS, Salesforce, Apple, PwC, Shopify, and Alibaba. Their raw material, as Goldstein puts it bluntly, is “the same articles, analysis, and journalism that publishers spend billions producing every year.”

The structurally novel feature of this layer is its distribution model. Nineteen of the twenty-one vendors profiled require no sales call, no contract, no licensing conversation. A developer signs up with a credit card and begins scraping. The buyer is an engineer building an AI system rather than an executive making a procurement decision. TollBit’s recent “State of the Bots” report found that roughly 30 percent of AI scrapes violate explicit robots.txt directives. None of the twenty-one vendors had licensing agreements with the publishers whose content they re-sell.

This layer completes the architecture. It is structurally downstream of both hyperscalers and CDN providers, yet it bypasses both Big Tech’s data-partnership regime and Cloudflare’s pay-per-crawl toll system. Firecrawl and its peers are themselves the entities doing the scraping that Cloudflare aims to block, and they re-sell what they extract as a clean-data-as-a-service product. The CDN Akamai reports a 300 percent surge in AI bot activity in 2025, with publishers representing 40 percent of all media-related AI bot traffic. According to Goldstein’s summary, publishers receive no payment from any of the twenty-one vendors identified.

Three layers. One direction of extraction. No publisher-side institution that operates across them.

The existing publisher responses

The response architecture currently being built reflects what each institution behind it is structurally positioned to do. Each intervention does real work; together, they don’t yet add up to a counterweight at the scale of the problem because the institutional form that would carry the full response hasn’t yet been built.

Bilateral licensing deals work for the publishers who can negotiate them. News Corp signed with OpenAI in May 2024 for an estimated $250M+ over five years. The New York Times signed with Amazon at $20–25M annually. Reuters with several. AP with Google. Axel Springer with OpenAI. Microsoft signed deals with Hearst, Condé Nast, and Dotdash Meredith in 2025. As of Q1 2026, USA Today Co. and IAC began reporting “meaningful” AI licensing revenue for the first time, driven by Meta deals signed in December 2025. These are real. They are also concentrated entirely at the top of the publisher hierarchy. The pattern is consistent: AI companies want major-brand premium content and will pay for it; everyone else competes for free.

SPUR — the Standards for Publisher Usage Rights coalition launched in February 2026 by the BBC, Financial Times, Guardian, Telegraph Media Group, and Sky News — is genuinely a publisher-side coordination effort. It aims to establish shared licensing standards for AI access to journalism. But its membership is, by design, elite UK publishers. It is a coordination forum among institutions that already have leverage; it does not, by itself, generate leverage for those who don’t.

CoMP — the IAB Tech Lab’s Content Monetization Protocol, released for public comment in March 2026 — is a technical specification for machine-readable content compensation terms. It is a protocol, not an institution. Protocols require implementation, and implementation requires an actor with the standing to insist that AI companies adopt them. CoMP does not name the actor. It assumes one will emerge.

Article 102 competition law — pursued by Cohen and Davies, whose recent analysis argues that Google’s AI Overview constitutes a clear abuse of dominant position under European law — targets the demand-side harm sharply but cannot reach the rest. An Article 102 case is procedurally bound to the specific dispute between one giant and its victims. Even a perfectly designed remedy against Google would leave the Cloudflare-x402 toll architecture and the scraper economy untouched. Cohen and Davies acknowledge this; their proposed remedies (an opt-in regime, a cease-and-desist requiring Google to “un-launch” AI Overview, a mandatory licence fee) all target Google specifically because the procedural architecture of EU competition law forces them to.

The Microsoft Publisher Content Marketplace and Cloudflare’s pay-per-crawl are the only operational, multi-publisher marketplaces currently functioning. They run on infrastructure-layer logic — designed by infrastructure companies, governed by infrastructure companies, optimized for the efficiency goals of infrastructure companies. The question is not whether they should exist. They do, and they probably should. The question is whether publishers will also build the marketplace that sits on their side of the same transactions.

Each of these interventions is necessary. But none of them, alone or together, is sufficient. The gap they leave is institutional: there is no publisher-owned entity with the scale, legal capacity, and AI-side relationships to negotiate collectively on behalf of publishers who lack individual leverage.

Think of the scraper economy as rising sea level. A seawall on one property doesn’t matter when the water keeps rising; what’s needed is the coordinated community and state response that operates at the scale of the problem. That gap is what a broker would fill.

What the missing piece is (and how it already exists elsewhere)

A publisher-side broker is, at its core, the AI-era version of an institutional pattern that has solved variants of this problem before. When individual rights holders face counterparties with vastly more leverage, aggregation through a collective is the response that has consistently worked.

Music publishers organized into performance rights organizations — ASCAP, BMI, SESAC, and their international equivalents — to collectively license songwriting royalties when individual songwriters could not negotiate with broadcasters one by one. The Associated Press was founded in 1846 as a wire service cooperative because individual newspapers couldn’t justify the cost of reporting from distant locations alone. Getty Images aggregated photographer licensing because individual photographers couldn’t negotiate with magazines, advertisers, and (eventually) every commercial user of imagery in the world. Factiva, Dow Jones’s own content licensing operation, aggregates content from thousands of publishers and licenses it to enterprise customers under negotiated terms — a model running successfully for decades.

The AI version of this pattern doesn’t exist. There is no collective licensing body for AI training. There is no aggregator that can present a unified front to OpenAI, Anthropic, Google, Meta, Microsoft, Amazon, and the rest of the major AI companies. There is no institution that can say to a scraper economy vendor: you cannot resell the content of our 200 member publishers, and here is the legal and technical apparatus we will deploy if you try.

A publisher-side broker, in operational terms, would do four things:

Aggregate anchor inventory. The broker would assemble a corpus of high-value content from member publishers — archives, reference material, premium journalism, specialized verticals — and present it to AI companies as a single licensable product. Individual publishers contribute their content; the broker handles the commercial relationship. The economic logic is straightforward: AI companies want quality data, and a hundred-publisher consortium is a more efficient counterparty to negotiate with than a hundred publishers individually. The broker takes a margin; publishers receive a share proportional to usage.

Enforce at the technical layer. Aggregation is meaningless without enforcement. The broker would operate or partner with AI crawler detection and gating infrastructure — the kind of capability that companies like Cloudflare and smaller specialized vendors currently provide separately. Standardized content metadata (CoMP, News Atom, schema.org for journalism) provides the substrate for machine-readable terms; enforcement against scraper-economy bypasses provides the teeth. Content from member publishers would be technically protected, with the broker’s legal weight standing behind the enforcement. A scraper vendor selling Firecrawl-style API access to consortium content would face not a single publisher’s cease-and-desist but the consortium’s combined legal capacity.

Negotiate at scale across all three layers. Bilateral deals with hyperscalers happen at the top. Negotiated access through standardized terms (CoMP, or a successor protocol the broker shapes) handles the middle layer. Legal action against scraper economy vendors operates at the bottom. The broker’s value is operating across all three rather than only one.

Maintain governance on the publisher side. The structural difference between a publisher-side broker and the Microsoft Publisher Content Marketplace or Cloudflare’s pay-per-crawl isn’t the existence of a marketplace — it’s who governs it. Pricing, terms, exclusion criteria, dispute resolution, and the basic question of who gets paid how much all need to sit with publishers. This is the precondition that makes the broker model different in kind from infrastructure-owned brokerage.

Who could actually build this? The characteristics required narrow the field considerably. The broker would need existing multi-publisher content licensing infrastructure (not built from scratch). It would need established AI-side business development relationships at the senior level — not the engineers buying Firecrawl access, but the C-suite of OpenAI, Anthropic, Google, Microsoft, Amazon. It would need legal capacity sufficient to litigate against well-funded scraper economy vendors and to negotiate against the largest AI companies in the world. It would need brand authority sufficient to convene publishers who are themselves competitors. And it would need anchor inventory of its own — a sufficiently valuable internal corpus to make membership attractive from day one.

In the world’s media economy, perhaps three to five institutions meet these criteria. The clearest operational expression of the pattern, as of early 2026, is Dow Jones’s Factiva. The platform has expanded to more than 8,000 licensed sources for generative AI use since launching Factiva Smart Summary in late 2024, with recent additions including The Atlantic, USA Today Co., Fast Company, McClatchy Media, The Globe and Mail, and Hong Kong Economic Times. Factiva general manager Emma O’Brian has framed the expansion explicitly as collective negotiation on behalf of small and mid-size publishers, telling Digiday in late 2025 that the unit is positioning itself as an “effective negotiator of AI licensing rights, to unlock royalties for small and mid-size publishers on a usage basis.” Her governance test — “we would never sign a deal we wouldn’t ourselves sign” — is precisely the publisher-side governance principle this piece argues is missing from infrastructure-owned brokerages.

Factiva is not the only experiment in this direction. The News/Media Alliance has launched parallel collective-licensing arrangements with AI partners Bria and ProRata, on a structurally different model: NMA provides a templated agreement, and member publishers contract bilaterally with the AI company, with 50 percent revenue share apportioned by attribution. NMA president Danielle Coffey describes the model as triangular — “we work on the legal terms… members contract individually.” That implies a different theory of where publisher leverage comes from than Factiva’s centralized aggregation: templated bilaterals preserve publisher autonomy at some cost to collective bargaining power; centralized aggregation creates leverage but requires publishers to trust the aggregator. Whether these parallel collective approaches converge, differentiate, or compete for the same publisher supply is itself an open question.

The harder question Factiva’s existence sharpens, rather than answers, is one of scale. 8,000 sources is a meaningful corpus; it is not yet the scale at which a broker can shift the political economy of AI-to-publisher commerce. The major AI labs are not yet primarily licensing through Factiva. The scraper economy is not yet meaningfully constrained by Factiva’s enforcement capacity. The x402 architecture is consolidating on a parallel track. The question is whether Factiva at its current scope can grow into the role this piece describes, or whether the role requires a different kind of institutional move — a consortium structure with multiple anchor publishers, deeper enforcement infrastructure, an explicit posture against the infrastructure-layer brokerages — that goes beyond what any single publishing company has yet attempted.

The market is watching to see whether the move materializes at the scale the moment requires.

Why it hasn’t happened yet

The reason a publisher-side broker doesn’t exist isn’t that publishers haven’t thought of the idea. It’s that several genuine structural challenges have to be overcome to build one, and the timing window is narrow.

Publisher competitive dynamics are real and well-understood by the people involved. A broker has to be structured so that no single publisher dominates governance, which probably means a consortium with leading participants rather than a single-owner platform. That is closer to the cooperative model of the Associated Press than the platform model of Spotify. That structure is harder to organize, and it requires publishers to subordinate competitive instinct to collective interest at a moment when most publisher leadership teams are focused on their own bilateral negotiations.

AI companies will resist. The whole logic of the scraper economy is that AI companies prefer no licensing conversations to negotiated ones. A broker forces those conversations back into the equation, and AI companies will route around it where they can — through scraper economy vendors, through legal challenges to the broker’s legitimacy, through holding out and waiting for publishers to capitulate. The broker has to be large enough and authoritative enough that AI companies cannot simply ignore it. Below that threshold, it doesn’t work.

The chicken-and-egg problem. The broker needs scale to attract AI buyers but needs AI buyers to attract publishers. The solution is anchor inventory — the founding publisher (or consortium of founding publishers) has to bring enough content of its own that AI companies want to negotiate from day one, before the broader publisher membership has materialized. This is why the broker probably has to be founded by a publisher that already has substantial AI licensing relationships and a major archive, rather than by a startup or a coalition of mid-tier publishers.

Google’s position is likely the biggest structural obstacle. Joshua Benton at Nieman Lab has argued, persuasively, that no meaningful publisher licensing market will materialize in 2026 because Google refuses to separate its search-indexing crawler from its AI-training crawler. As long as participation in Google’s search index implies participation in its training data, the price for that data is effectively zero. Every other AI company sees this and asks: why pay for what our largest competitor gets free? Benton’s argument is that until regulatory action forces the separation, the licensing market is structurally stalled.

He is right about the constraint. The broker model is precisely the institutional precondition for the regulatory pressure that would lift it. Individual publishers cannot push back against Google’s crawler entanglement at sufficient scale. Bilateral lawsuits move at the speed (and cost) of litigation. The Penske Media antitrust filing from February 2026 documented Daily Mail click-through rate collapsing from 25 percent to under 3 percent when an AI Overview surfaces above its link — but as a single-publisher case, its remedy can only address Penske’s specific harms. An organized publisher block — a broker with hundreds of members and significant content market share — could change the political economy of the Google crawler question entirely. The broker doesn’t solve Benton’s problem directly. It is the actor that could plausibly force the regulatory solution Benton (correctly) says is needed but isn’t currently happening.

Timing. The x402 architecture is consolidating now. Microsoft’s PCM is operational. Cloudflare’s marketplace is moving from private beta to broader availability. The infrastructure-layer brokerage models are establishing the terms by which AI-to-publisher commerce happens. A publisher-side broker that emerges in 2027 inherits an industry where the major AI companies have already learned to operate through Cloudflare and Microsoft’s terms. A broker that emerges in late 2026 still has room to shape the protocols. Past that, the costs of restructuring rise sharply.

These are real challenges. They are not, individually or collectively, prohibitive. They are the reasons the move requires deliberate institutional action rather than emerging spontaneously, and the reason it hasn’t already happened.

What this all means

For publishers below the elite tier, the live choice is one of three paths. The first is bilateral negotiation with AI companies on whatever terms the AI companies are willing to offer — which, for publishers without major-brand leverage, typically means very little or nothing at all. The second is comprehensive blocking and a strategic retreat into direct audience relationships — subscriptions, newsletters, specialized events — accepting that AI-mediated discovery will happen without you. This is a coherent strategy that some publishers have chosen deliberately; whether it scales beyond brands with established trust is the open question, and there’s a reasonable counter-argument that the loss of discoverability in AI-driven search itself becomes a major issue. The third path is collective negotiation through a publisher-side institution that doesn’t yet exist. The argument here is not that the third path is the only viable one. It is that it is the only one that addresses the supply-side enclosure for publishers who cannot individually weather the structural isolation of the second.

For AI companies, the choice is between engaging now with a publisher-side broker on terms negotiated jointly, or facing a fragmented and increasingly hostile environment of bilateral negotiations, scraper economy legal exposure, and regulatory pressure that grows in proportion to the asymmetry of the current situation. The companies that engage early on collective terms will pay more per piece of content than they currently do through the scraper economy bypass. They will also gain access to a stable, authoritative, contractually clean corpus that the scraper economy cannot provide, and they will have done the political work to insulate themselves from the regulatory wave that’s building.

For regulators, the supply-side plumbing is where policy attention should follow once the AI Overview cases settle. Article 102 against Google is necessary but procedurally bounded. The harder regulatory work is establishing that infrastructure-layer brokerage — Cloudflare’s x402, Microsoft’s PCM — is itself a form of market power that requires governance attention. The EU’s Digital Markets Act framework was designed for a previous moment; it does not capture the layer where the current consolidation is happening. The vocabulary for that capture has yet to be developed.

For readers and the open web more broadly, the question is whether journalism, research, and the broader information commons remain economically viable as inputs to AI systems that have learned to extract their value without compensating their production. The licensing deals that have materialized so far flow to a handful of major players. The scraper economy operates without compensation at all. If a publisher-side broker doesn’t emerge in the next twelve to eighteen months, the answer to that question gets decided by default — and the default is enclosure.

The infrastructure is the politics

Google I/O 2026 will be remembered as the inflection point at which the search-as-portal model gave way to search-as-destination at scale. NPR’s “extinction-level event” framing captures the demand-side moment. But the demand-side moment is the visible part of a structural shift whose less visible parts — the toll architecture, the scraper economy, the infrastructure-layer brokerage — will determine the long-term shape of the open web more than the AI Overview interface ever could.

The flat-rate web rewarded publishers asymmetrically but preserved their formal autonomy. The pay-per-crawl web automates a transfer of rent-setting authority from the people who create content to the companies that own the plumbing. That transfer is, right now, happening by default — not because publishers have lost the argument, but because the institutional response that would carry the argument hasn’t yet been built.

The infrastructure layer has already decided publishers will be paid through its toll booth. Publishers haven’t yet decided whether they’ll be paid on those terms or their own.

The window to decide is closing.

This piece draws on a longer academic argument developed at the University of Amsterdam’s Media Studies department: “The New AI Scraper Economy and the Three-Layer Enclosure of the Open Web” (April 2026).

Leandro Oliva

Discussion about this post

Ready for more?