From Clicks to Citations – Building Attribution Infrastructure for AI and LLMs

Alex Spring, Senior Director, AI Strategy and Partnerships at impact.com and Director of OpenAttribution, addresses the urgent need for AI citation so that publishers, brands and creators can measure – and remunerate – their influence

AI’s biggest impact on the media, search, and consumer spaces has come through Large language models (LLMs). The likes of ChatGPT and Google’s AI Overview are in the process of inverting the way we search for everything, from factual research to shopping. More and more the information on the internet is brought to us by agents rather than us as browsers going out to get it. Around 60% of searches no longer result in a click, and it is easy to see why not – if a Google search query is answered with an AI Overview, clicks on organic results collapse by 61%.

In other words, LLMs are doing a good job of telling us what we want to know. But where do they get their information from?

No prizes for guessing that it is scraped from publishers, brands and creators, who aren’t necessarily cited (or remunerated) for their work. The information is taken, and presented to consumers without attribution – to the almost exclusive benefit of the AI platforms themselves.

New attribution frameworks are now appearing to address this, creating the measurement infrastructure for LLMs to show where they source information, and potentially offering remuneration models for anyone – including publishers, creators and brands – that produces the content that fuels them.

Value chain destabilisation

To understand how we got here, we have to understand where we’ve come from. Historically, search engines indexed publisher content of many different kinds, and sent traffic in return. These publishers monetised the traffic, and brands paid for ads and affiliate content because they could see what was happening and understand the ROI. The value chain was imperfect, but it was legible and the system existed in a tension that was clear.

AI and LLMs have broken that value chain, because they don’t do a good enough job of citing the sources they scrape for information. The likes of ChatGPT, Gemini and Perplexity simply digest publisher content, reprocess it, and take full credit for the value delivered. The user gets an answer built from publisher and brand content, while the publishers and brands get no visibility in return: no visits, no measurable clicks, and no attributable revenue.

This is not just a broken version of the old value exchange; it is an entirely new model, and there is no infrastructure to support it.

Today, publishers have no way of knowing which AI systems are using their content, no way to measure which content drove which outcome, and no mechanism for value to flow back. Brands, in turn, have no way of understanding how their products are discussed, what content drives that opinion, and how potential customers are learning about their products.

Exacerbating the problem is the fact that publishers of all kinds are culturally wired to make their content scrapeable. Historically, this has roots in our belief that making content SEO-friendly meant Google was likely to index content higher in its ranking results. That was when the value exchange offered something in return.

Today, everyone is blindly chasing GEO, yet in the AI/LLM new order there is no kickback. Publishers make their content easy for AI to scrape, and LLMs take it without even acknowledging their kindness.

New attribution infrastructure

So what should content producers do to rectify this wrong? Do they put their hands up and say, ‘This horse has already bolted, we just have to accept the new normal’? Are they brave (and big) enough to get litigious? Or is there another route to a fairer future?

New attribution infrastructures are beginning to make the latter possible. Initiatives such as OpenAttribution are creating open standards for AI agent identity and content attribution, so the measurement layer exists for value to return to the people who create it.

OpenAttribution (both open-source and non-profit) creates ‘telemetry’ – standardised reporting signals from agentic conversations that track when content is retrieved and cited. If publishers require telemetry as a licensing condition, it creates a measurable link between content and outcome. Content can then be compensated at the time of citation or on a performance basis, similar to an affiliate model.

The middle agentic layer

The core need here is the ability to measure content influence. The pattern has a direct precedent in advertising: ads.txt solved a billion-dollar fraud problem by letting publishers declare who was authorised to sell their inventory. OpenAttribution works on the same principle – content owners register with a trusted telemetry server that resolves retrieval and citation events back to whoever created the content. Whether that content was accessed directly from a publisher’s site or served through a marketplace like Amazon, the measurement flows back to the right owner.

The strongest commercial incentive sits in the ‘middle agentic layer’ – such as Amazon’s Rufus and other on-site shopping assistants and GPT-style apps.

In retail, LLMs are already working hard to push conversions based on consumer searches. Should Amazon adopt the OpenAttribution standard, Rufus would emit a signal whenever it cites scraped content it uses, benefiting the publishers, brands, affiliates and affiliate networks responsible for that content who have implemented OpenAttribution. These entities can then be remunerated depending on the outcome of that interaction.

This process is entirely privacy-safe and requires no context from the customer. There is no need to know what the user said to the LLM, and what the bot replied. It’s simply a counter that cites the scraped content used, and measures its influence.

A critical moment for LLMs

The path to agentic commerce is proving nonlinear. OpenAI recently killed Instant Checkout, a feature that let users buy products without leaving ChatGPT. Product data was too fragmented, tax collection wasn’t built, and they’d only integrated a dozen Shopify merchants. The in-chat checkout layer is gone.

But the behaviour it was trying to capture is very much alive. Google’s Universal Commerce Protocol is pushing ahead. John Lewis just announced a significant investment in AI-powered shopping. And Alibaba’s Qwen app already completes purchases inside a conversational interface at scale – because Alibaba owns the AI model, the marketplace, the payment rails, and the logistics. What failed was one company trying to be both the pipe and the toll booth without owning the stack.

This makes content attribution more urgent, not less. As AI shopping splinters across retailers’ own apps, embedded agents, and platform-native experiences, tracking which content influenced which outcome becomes harder – and more valuable.

That’s what open attribution standards aim to address. Things are moving. The House of Lords Communications and Digital Committee published its report on AI and copyright this week, recommending a ‘licensing-first’ approach and calling for technical standards for rights reservation, data provenance, and content labelling. SPUR (BBC, FT, Guardian, Sky News, Telegraph) launched last week to develop shared technical standards and licensing frameworks for AI’s use of journalism.

But rights without infrastructure are just words on paper. Publishers may be getting the legal framework, but what they still lack is the measurement layer that tracks what actually happens to their content once it enters an AI system – which content was retrieved, which was cited, and what outcome it influenced. That’s the piece that needs building.

Tags: AI