Research · AI Search Citations

How ChatGPT, Gemini, and Claude Read and Cite Your Content (2026)

The three engines don't choose citations the same way — one casts a wide net and cites a sliver, one reads almost nothing yet cites nearly all of it, and one reads deeply and verifies twice. Here is how each engine actually reads and cites, and the per-engine playbook we use to win the citation, not just the crawl.

By Vijay Vasu, Founder of Indexable — first SEO hire at Uber Eats, former Director of SEO at Zendesk. Published June 25, 2026.

The short answer

ChatGPT, Gemini, and Claude cite differently: ChatGPT cites about 5% of what it retrieves through a 100–200 word window, Gemini cites nearly everything it retrieves but reads almost nothing, and Claude reads full pages and re-verifies on a second pass (citing about 64%). Optimize per engine, and measure source→cite conversion — not just whether you were mentioned.

ChatGPT, Gemini, and Claude do not choose citations the same way. They are three different machines: ChatGPT casts a wide net and cites a sliver, Gemini reads almost nothing yet cites nearly all of it, and Claude reads deeply and re-verifies its citations on a second pass. Optimizing for “AI search” as one target means optimizing for an average none of the three engines uses.

The reason this matters: being retrieved is not the same as being cited, and the gap between them is where brands lose. In one instrumented test, ChatGPT pulled 39 pages for a query and cited only 2 — a 5% citation rate (Petrovic, 2026). Getting crawled was easy; surviving the citation cut was not. That cut works differently on every engine, so here is how each one reads and cites — and the per-engine playbook we use at Indexable to win the citation, not just the crawl.

How do ChatGPT, Gemini, and Claude decide what to cite?

Each engine runs a different retrieval-and-citation pipeline. The instrumented figures below come from a 2026 study by researcher Dan Petrovic (DEJAN) that measured all three engines on matched queries:

Dimension	ChatGPT	Gemini	Claude
Citation rate (cited ÷ retrieved)	~5% (a ~20:1 cut)	~100% (cites ~1:1)	~64%
How much it reads	100–200 word sliding window	Snippets only (lightest read)	Full pages (heaviest read)
Citation pass	Single, fast	Single	Two passes — re-verifies
Source URLs	Direct + UTM (trackable)	Redirect-wrapped (masked)	Direct

The headline: ChatGPT reads a 200-word window, Gemini reads a snippet, and Claude reads the full page once and then re-verifies its citations on a second pass (Petrovic, 2026). A page engineered for one engine is not automatically right for the others. (Caveat below — this is one study, so treat the architecture differences as durable and the exact percentages as directional.)

How does ChatGPT choose what to cite?

ChatGPT retrieves broadly, then filters hard. According to Petrovic (2026), it sourced 39 pages for a single query and cited only 2 — a 5% citation rate — and it evaluates each candidate through a 100–200 word sliding window rather than the full page. That window is the whole game.

Petrovic measured the window; the rule we draw from it is Indexable's own: a citable claim has to be a self-contained, quotable unit inside about 200 words. If a statistic is set up in one paragraph and paid off three scrolls later, ChatGPT's window never sees the complete thought, and the claim loses the cut. Front-loading is not enough — the claim must be atomic: subject, claim, and number in one sentence. “Indexable cut client time-to-first-byte 40%” survives the window; “It improved by 40%” does not, because the window may not contain what “it” refers to. This single behavior explains why thin, well-structured pages routinely out-cite richer pages that bury their best claims. On ChatGPT, structure beats volume because the reader is a narrow window, not a full crawl.

Being retrieved is table stakes; being cited is the product. ChatGPT retrieves about 20 pages to cite one — so “we're getting crawled” is not a result.

How does Gemini choose what to cite?

Gemini cites nearly everything it retrieves — close to a 1:1 ratio (Petrovic, 2026) — and it reads the least of the three engines, leaning on snippets instead of full pages. Its retrieval set is small and binary: a page is either in it or invisible.

That makes Gemini a first-chunk-or-nothing engine. The answer and the structured data in your opening section carry the entire decision, because a snippet reader does not get a down-page second chance. Lead with the direct answer, mark it up with JSON-LD schema, and assume nothing below the fold will be read. One Gemini-specific trap for measurement: its source URLs are redirect-wrapped (Petrovic, 2026), so Gemini referrals are masked in analytics. You cannot prove Gemini visibility from a GA4 referrer report — it has to be measured by probing the engine directly, which is why a brand can be cited heavily by Gemini and see almost none of it in standard analytics.

How does Claude choose what to cite?

Claude reads the most and commits the hardest. According to Petrovic (2026), it pulls full pages, fires a single query, and then re-analyzes its citations on a second pass, ending up citing about 64% of what it sourced — far more selective than Gemini, far more generous than ChatGPT.

Because Claude verifies twice, it rewards depth and internal consistency: comprehensive coverage, claims that are independently supported, and evidence that holds up on a re-read. Thin or self-contradictory pages that slip through a single-pass engine get caught on Claude's second look. The implication for writers is that Claude is the hardest engine to fool and the most worth writing thoroughly for — a page that genuinely answers the question, with sourced claims that do not contradict each other, is what survives the verification pass. If ChatGPT rewards atomic structure and Gemini rewards a strong first chunk, Claude rewards substance.

What is the metric that actually matters?

The metric is source→cite conversion: of the pages an engine retrieved for a query, how many did it cite — and was yours one of them? “We are getting retrieved” is not a result. ChatGPT retrieves about 20 pages to cite 1 (Petrovic, 2026), so retrieval is table stakes and citation is the product.

This reframes the diagnosis entirely. A brand that is retrieved but not cited has an extractability problem — its content is findable but not quotable — and the fix is structural. A brand that is never retrieved has an authority or relevance problem, and the fix is different. Most AI-visibility tools only report whether you were mentioned; they cannot tell you which of those two problems you have, which means they cannot tell you what to do next. Separating “not retrieved” from “retrieved but not cited” is the single most useful thing a measurement system can do, and it is what Indexable was built to measure and then fix.

How do you write content that gets cited across all three engines?

Start by writing to the strictest reader for each job, and the others come for free. Here is how to apply a short per-engine playbook — you can implement all four moves on a single page:

For ChatGPT — atomic claims. Make every citable claim a standalone sentence (subject + claim + number) that survives a 100–200 word window. No claim that depends on a paragraph above it.
For Gemini — win the first chunk. Put the direct answer and JSON-LD schema in the opening section; treat everything below as unread by the lightest reader.
For Claude — depth that survives a second read. Cover the topic completely, attribute every data point to a source, and keep claims internally consistent.
For all three — structure for extraction. Use question-format headings that match real queries, answer-first sections, tables, lists, and schema (FAQPage and BreadcrumbList each correlate with markedly higher citation rates in AirOps's 2026 Fan-Out analysis).

Indexable's Content Engineer enforces these as an automated pre-publish gate, including an atomic-claim check that flags any citable claim too buried or too dependent to survive ChatGPT's window. Use this playbook on your next page, then re-check it against the gate before you publish.

How should you measure whether AI engines are citing you?

Measure per engine, and do not trust analytics to tell the whole story. Because ChatGPT passes trackable UTM-tagged links (Petrovic, 2026), its referrals appear in GA4; because Gemini's links are redirect-wrapped, its citations do not, so Gemini visibility has to come from direct probing.

You should probe each engine the way a buyer would. Start by asking the questions your customers ask, repeat them several times, and record whether you were retrieved, whether you were cited, and who beat you. Next, track source→cite conversion over time, per engine — you can do this manually with a spreadsheet or with a tool that probes the engines for you. That longitudinal, per-engine record is the difference between guessing and knowing, and because the engines change versions often, it has to be continuous rather than a one-time audit. For a hands-on method, see our ChatGPT visibility tracker guide, and for the broader playbook, how to rank in ChatGPT.

The honest caveat

The instrumented figures here come from a single 2026 study by Dan Petrovic (DEJAN) on a small set of queries, and model versions change quickly. Treat the architecture differences — tight versus wide versus deep — as the durable insight, and the exact ratios as directional. The per-engine behaviors are stable enough to engineer for; the precise percentages will drift. Several things would move the numbers: a model version update, a different query category (a developer question retrieves a different source set than a shopping question), and the brand's own authority, which changes whether it gets retrieved at all. So read the percentages as a snapshot of how each engine behaves, not as fixed constants. This is exactly why measurement has to be continuous rather than one-and-done — the only way to know your real source→cite conversion on today's models is to measure it on today's models, then keep measuring as they change.

Frequently asked questions

Is being retrieved by an AI engine the same as being cited?

No. Retrieval means the engine pulled your page as a candidate; citation means it used and credited you. ChatGPT retrieves about 20 pages to cite 1 — a 5% citation rate (Petrovic, 2026) — so retrieval alone has little value. The citation is what drives visibility and referral traffic.

Why does ChatGPT cite some pages and not others?

ChatGPT reads candidates through a 100–200 word sliding window and cites about 5% of what it retrieves (Petrovic, 2026). Pages whose key claims are self-contained within that window — subject, claim, and number in one sentence — survive the cut far more often than pages where the claim is spread across the page.

Can I see Gemini citations in Google Analytics?

Largely no. Gemini wraps source URLs in redirects (Petrovic, 2026), so its referrals are masked in GA4. Gemini visibility is best measured by probing the engine directly rather than reading an analytics referrer report.

Which AI engine is easiest to get cited by?

It depends on your content. Gemini cites close to 100% of what it retrieves but retrieves very little, so the bar is getting into its small set. ChatGPT retrieves widely but cites about 5%, so the bar is surviving the cut. Claude reads deeply and rewards comprehensive, consistent pages.

Know your source→cite conversion

Indexable's AI-visibility agents probe ChatGPT, Gemini, and Claude the way your buyers do — then show you which pages win or lose the citation, per engine, and how to fix the gap. Get a free AI search audit.

Get the Free Audit