How GEO Works

How AI Engines Find and Cite Brands

When a user asks an AI assistant a question, the response doesn’t come from a single search. The engine runs a multi-stage pipeline that transforms one question into dozens of searches, retrieves hundreds of candidate pages, then discards 97–98% of them before generating a cited answer. Understanding this pipeline is key to GEO. Each stage has different filtering criteria, and your content needs to survive all of them to get cited.

Stage 1: Query Fan-Out

The engine doesn’t search your exact question. It decomposes it into 8–12 parallel sub-queries, each targeting a different angle: definitions, comparisons, pricing, recent developments, expert opinions, alternatives. For example, “What’s the best CRM for small teams?” might become:

“best CRM software small business 2026”
“CRM comparison small teams pricing”
“CRM features for teams under 50”
“HubSpot vs Salesforce small business”
and several more variants

Only 27% of these sub-queries are stable across repeated searches (iPullRank, 2025). This means content breadth matters more than single-keyword depth. A page that covers multiple angles of a topic has more chances to be retrieved by at least one sub-query.

Google AI Mode fires hundreds of sub-queries for complex “Deep Search” requests using a custom Gemini model specifically designed for query fan-out (Google, May 2025).

Stage 2: Web Retrieval

Each sub-query hits the engine’s internal search backend:

Engine	Search Backend
ChatGPT	OpenAI’s own search index (transitioned from Bing in 2024–2025, now primary)
Perplexity	Proprietary index (built in-house since 2023)
Google AI Overview	Google Search index + Knowledge Graph
Gemini	Google Search index
Claude	Undisclosed (Anthropic has not publicly confirmed which search provider is used)

This stage retrieves dozens to hundreds of candidate documents across all sub-queries combined. The exact number varies by engine and query complexity. At this point, your page just needs to be findable by one of the sub-queries. Traditional SEO signals (domain authority, backlinks, keyword relevance) still influence this step.

Stage 3: Semantic Ranking and Filtering

The engine runs multi-layer machine learning ranking on all retrieved documents:

Semantic similarity scoring: Documents are compared to the original query using embedding models. Pages with cosine similarity of 0.88+ get 7.3x more citations (Wellows, Dec 2025).
E-E-A-T gate: A binary pass/fail filter. 96% of eventually-cited content passes this gate; below-threshold content is excluded regardless of relevance. Author credentials, schema markup, and domain reputation matter here.
LLM re-ranking: A separate model re-ranks surviving documents at the passage level, not the page level.

After this stage, ~50–100 candidates remain. This is where the pool drops most dramatically.

Stage 4: Passage Extraction

AI engines don’t read entire pages. They extract specific passages of 134–167 words (the optimal cited passage length from large-scale analysis). What determines which passage gets selected:

Information density: Factually dense, no preamble or filler
Self-containment: The passage answers the sub-query without needing surrounding context
Structural signals: Headings, FAQ blocks, comparison tables, and definition boxes signal chunk boundaries to the parser
Position on page: 44.2% of all citations come from the first 30% of a page’s content. Content in the bottom 10% earns only 2.4–4.4% of citations (AirOps, 2026)

This is why the Quotability attribute (self-contained answer capsules after each H2) matters so much. It directly addresses how passage extraction works.

The “answer in the first 100 words” pattern (BLUF) is not a writing style preference. 90% of top-cited content follows it because passage extractors evaluate early content first and assign it higher relevance scores.

Stage 5: Prompt Assembly (Pre-Citation Embedding)

This is the most misunderstood stage. Citations are not added after the response is written. They are structurally embedded into the LLM’s prompt before it generates a single token. The orchestration engine assembles:

Citation markers linked to specific source URLs
Ranked passage excerpts from surviving documents
Publication dates and authority metadata
Contradiction resolution between conflicting sources

The LLM then generates prose that is architecturally bound to these pre-embedded sources. It cannot invent a URL it was not given.

Stage 6: Response Generation

The final LLM generates the response, constrained by the assembled evidence. Each inline citation number corresponds to a specific passage excerpt from Stage 4. The result: Out of all initially retrieved candidates, only 5–15 are cited in the final response. The vast majority of retrieved content is discarded.

Why This Matters for GEO

Each pipeline stage requires different content qualities:

Stage	What your content needs	GEO lever
Fan-out	Broad topical coverage	Cover multiple query angles per page
Retrieval	Findability by search engines	Technical SEO, schema markup
Semantic ranking	High relevance to query	Intent-aligned headings, direct answers
E-E-A-T gate	Trust signals	Author credentials, verifiable data, structured data
Passage extraction	Self-contained, information-dense passages	Answer capsules, BLUF pattern, FAQ blocks
Prompt assembly	Fresh, authoritative, non-contradictory	Temporal grounding, entity consistency

A page that’s strong at Stage 2 (good SEO rank) but weak at Stage 4 (no extractable passages) will be retrieved but never cited. This is exactly the gap that GEO strategy addresses.

What Drives AI Citations

Research across 129,000+ domains and 680M+ citations reveals six key factors:

1. Referring Domain Authority

The strongest predictor of AI citations. Brands with deeper, higher-trust backlink profiles are cited far more often. This overlaps with traditional SEO but carries even more weight in GEO.

2. E-E-A-T Signals

Experience, Expertise, Authoritativeness, and Trustworthiness. AI systems prioritize content that demonstrates real-world outcomes and verifiable expert credentials.

3. Content Freshness

Cited pages skew newer. Explicit dates (“Last updated: 2026-04-07”) and current statistics signal freshness to AI engines.

4. Entity Consistency

Consistent brand name, positioning, and facts across your website, social profiles, directories, and third-party sources reduce ambiguity for AI retrieval.

5. Third-Party Mentions

Independent mentions on Reddit, G2, Trustpilot, and industry publications are disproportionately present in AI-cited sources. Even unlinked brand mentions contribute.

6. Structured Data

Organization, Product, FAQ, and Review schema markup helps AI engines extract and verify brand facts during retrieval.

What You Can Actually Influence

Of these six factors, some are within your direct control and some require longer-term investment:

Factor	Your control level	How to improve
Domain Authority	Indirect (long-term)	Publish quality content, earn backlinks naturally, fix technical SEO issues in SEO Health
E-E-A-T Signals	High	Add author credentials, case studies, verifiable data to your pages
Content Freshness	High	Keep pages updated with dates, refresh statistics quarterly
Entity Consistency	High	Align brand info across website, profiles, and directories via Brand Settings
Third-Party Mentions	Low (and risky)	Organic community presence; paid/automated approaches carry significant risks
Structured Data	High	Add schema markup, FAQ blocks, logical heading hierarchy

The factors you can directly control (E-E-A-T, freshness, entity consistency, structured data) are exactly what GEO page optimization targets. This is why content optimization delivers measurable results faster than third-party strategies.

Engine-Specific Behavior

Each AI engine has different source preferences:

Engine	Primary Sources	Strategy
ChatGPT	OpenAI search index, Wikipedia heavily weighted	Structured content + Wikipedia presence
Perplexity	Proprietary index, Reddit heavily weighted, real-time search	Community engagement + fresh content
Google AI Overview	Google organic results (strongly correlated with traditional rankings)	Traditional SEO + Schema markup
Claude	Undisclosed, E-E-A-T signals	Fact-based, authoritative content
Gemini	Google Search index	Google SEO + structured data

Optimizing for multiple engines simultaneously is more effective than targeting a single one. Anymorph tracks all engines in parallel so you can identify cross-engine opportunities.

How Anymorph Measures Visibility

PAWC (Position-Adjusted Word Count)

Anymorph uses PAWC, the same metric from the Princeton KDD 2024 research paper, as its core visibility measurement:

positionWeight = exp(-relativePosition / (mentionSpan - 1))
PAWC = wordCount × positionWeight

This means a brand mention at the top of an AI response is weighted exponentially higher than one at the bottom. It’s not just about being mentioned; it’s about where you appear.

GEO Score (0–100)

Your overall AI visibility is expressed as a GEO Score, a weighted composite of six sub-scores:

Sub-score	Weight	What it measures
Visibility	30%	PAWC-based visibility index
SOV Rank	20%	Share of Voice ranking vs competitors
Sentiment	15%	Positive vs negative brand mentions
Citation Coverage	15%	% of citations pointing to your domain
Consistency	10%	Uniformity across different AI engines
Intent Coverage	10%	Visibility spread across search intents

Grade Scale

Grade	Score	Meaning
A	80+	Strong AI position. Maintain and expand.
B	60–79	Good visibility. Target 1-2 areas for optimization.
C	40–59	Average. Systematic GEO strategy needed.
D	20–39	Low visibility. Foundational content work needed.
F	< 20	Nearly invisible to AI. Full strategy review required.

Anymorph calculates SOV in two ways:

PAWC-based SOV: Academically precise, considers mention position and length
Mention-based SOV: Intuitive count of brand mentions vs competitors

Both are displayed in the Visibility dashboard to give you complementary perspectives.

Benchmarks

SOV Benchmarks

< 5%: Nearly absent from AI responses
5–15%: Present but lacking authority
15–30%: Competitive presence
30%+: Category leader in AI responses

Citation Drift

Monthly citation variation of ~55% is normal. AI engines are non-deterministic, so some fluctuation is expected. Consistent trends over 30+ days are more meaningful than day-to-day changes.

Getting Started

Understanding GEO

Dashboard

Settings

Guides

How AI Engines Find and Cite Brands

Stage 1: Query Fan-Out

Stage 2: Web Retrieval

Stage 3: Semantic Ranking and Filtering

Stage 4: Passage Extraction

Stage 5: Prompt Assembly (Pre-Citation Embedding)

Stage 6: Response Generation

Why This Matters for GEO

What Drives AI Citations

What You Can Actually Influence

Engine-Specific Behavior

How Anymorph Measures Visibility

PAWC (Position-Adjusted Word Count)

GEO Score (0–100)

Grade Scale

Benchmarks

SOV Benchmarks

Citation Drift

Next Steps

GEO Strategy

Visibility Dashboard

Getting Started

Understanding GEO

Dashboard

Settings

Guides

Documentation Index

​How AI Engines Find and Cite Brands

​Stage 1: Query Fan-Out

​Stage 2: Web Retrieval

​Stage 3: Semantic Ranking and Filtering

​Stage 4: Passage Extraction

​Stage 5: Prompt Assembly (Pre-Citation Embedding)

​Stage 6: Response Generation

​Why This Matters for GEO

​What Drives AI Citations

​What You Can Actually Influence

​Engine-Specific Behavior

​How Anymorph Measures Visibility

​PAWC (Position-Adjusted Word Count)

​GEO Score (0–100)

​Grade Scale

​Share of Voice

​Benchmarks

​SOV Benchmarks

​Citation Drift

​Next Steps

GEO Strategy

Visibility Dashboard

How AI Engines Find and Cite Brands

Stage 1: Query Fan-Out

Stage 2: Web Retrieval

Stage 3: Semantic Ranking and Filtering

Stage 4: Passage Extraction

Stage 5: Prompt Assembly (Pre-Citation Embedding)

Stage 6: Response Generation

Why This Matters for GEO

What Drives AI Citations

What You Can Actually Influence

Engine-Specific Behavior

How Anymorph Measures Visibility

PAWC (Position-Adjusted Word Count)

GEO Score (0–100)

Grade Scale

Share of Voice

Benchmarks

SOV Benchmarks

Citation Drift

Next Steps