What is the citation share study?

A live, refreshed-monthly cross-engine study tracking 1,200 prompts across five generative engines (ChatGPT, Perplexity, Claude, Gemini, Microsoft Copilot) and 12 verticals. The output is a per-vertical breakdown of which domains absorb citation share, with three intent layers per prompt: informational, comparative, transactional.

How is the study run methodologically?

Each prompt is run cold (no system prompt, no persona) against each engine's default model on the public web tier, sampled twice daily. Citations are extracted via structured citation output (where the engine emits one) or URL detection in the answer body. Domain-level deduplication uses eTLD+1, so subdomains roll up to root domains.

Which sources lead in citation share?

Wikipedia is the most-cited source in 8 of 12 verticals though its share is declining as engines diversify. Reddit is in the top 5 in 9 of 12 verticals, up from 6 last year. First-party brand domains capture about 12% of transactional citations but under 4% of informational citations, the gap is the GEO opportunity.

How does the study fit the GEO ranking?

The study is the empirical backbone of the GEO ranking: the data used to validate which tools' citation-discovery capabilities map to citation reality. Tools that show what is being cited but not who is being cited are solving the AEO problem with extra steps, not the GEO problem.

Citation share, 2026 cross-engine study

This is the live page for our 2026 citation-share study. We track a fixed set of 1,200 prompts across the five generative engines that matter for the brands we work with, and report on which domains are absorbing the citation share.

What we track

Five engines. ChatGPT, Perplexity, Claude, Gemini, Microsoft Copilot. We sample each twice daily; we do not publish results from engines we have less than 30 days of data on.
Twelve verticals. SaaS, e-commerce, fintech, healthtech, edtech, B2B services, media, travel, automotive, real estate, legal, government. 100 prompts per vertical.
Three intent layers per prompt. Informational (“what is…”), comparative (“X vs Y”), transactional (“best X for Y under $Z”).

Every prompt is human-curated and passes a relevance review every six weeks. The set is deliberately small enough to be defensible and large enough to be statistically meaningful.

Methodology, in one paragraph

Each prompt is run cold, with no system prompt and no persona, against each engine’s default model on the public web tier. Citations are extracted from the response either via the engine’s structured citation output (where one exists) or via URL detection inside the answer body. Domain-level deduplication is done at the eTLD+1 level, so blog.example.com and example.com are counted as the same domain. The full methodology, including how we handle citation order and weighting, is documented at /methodology.

Headline findings, current quarter

The numbers below are placeholders for the live study. We refresh them on the first business day of each month.

Wikipedia is the most-cited source in 8 of 12 verticals, but its share has declined in 5 of those 8 over the past year as engines diversify.
Reddit continues to climb. It is now in the top 5 most-cited sources in 9 of 12 verticals, up from 6 last year.
First-party brand domains capture roughly 12% of citations in transactional prompts, but under 4% in informational prompts. The gap is the GEO opportunity.
Small specialist publishers (under 50,000 monthly visitors) capture about 18% of all citations in long-tail prompts. The “long-tail GEO” thesis holds.

How this fits the GEO model

The study is the empirical backbone of the GEO ranking: the data we use to validate which tools’ citation-discovery capabilities actually map to citation reality. Tools that show you what is being cited but not who is being cited are not solving the GEO problem; they are solving the AEO problem with extra steps.

How to cite us

If you cite the study, please link to this page directly. We update the headline numbers monthly and historical snapshots are available on request.

Adjacent reading

For the practical playbook see publisher playbook.
For the benchmarks programs should hold themselves to see GEO benchmarks.

Bottom line

A live cross-engine citation share study across ChatGPT, Perplexity, Claude, Gemini, and Copilot. 1,200 prompts across 12 verticals × 3 intent layers, sampled twice daily. Wikipedia leads citation share in 8 of 12 verticals; first-party brand domains capture 12% of transactional citations but under 4% of informational, that gap is the GEO opportunity.

Reviewed by

Ari Lieberman

Editor · 20 years in content & search marketing

Updated

April 11, 2026

How we score →

Ari spent 14 years running a content marketing agency that worked with publishers, DTC brands, and B2B SaaS, before stepping back to focus on research in 2024. Twenty years in digital marketing, with a track record that goes back to the days when a Google PageRank update was front-page news. He has lectured part-time on digital media at Reichman University, contributed essays to the Content Marketing Institute, and now writes about generative engines full-time. Off-hours he plays jazz drums in a Tel Aviv quartet, runs his family's small olive press in the Galilee every September, and is teaching himself to repair short-wave radios. Methodology and affiliate disclosure are documented at /methodology.

Citation share, 2026 cross-engine study

What we track

Methodology, in one paragraph

Headline findings, current quarter

How this fits the GEO model

How to cite us

Adjacent reading

FAQ

What is the citation share study?

How is the study run methodologically?

Which sources lead in citation share?

How does the study fit the GEO ranking?