Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.anymorph.ai/llms.txt

Use this file to discover all available pages before exploring further.

How AI Engines Find and Cite Brands

When a user asks an AI assistant a question, the response doesn’t come from a single search. The engine runs a multi-stage pipeline that transforms one question into dozens of searches, retrieves hundreds of candidate pages, then discards 97–98% of them before generating a cited answer. Understanding this pipeline is key to GEO. Each stage has different filtering criteria, and your content needs to survive all of them to get cited.

Stage 1: Query Fan-Out

The engine doesn’t search your exact question. It decomposes it into 8–12 parallel sub-queries, each targeting a different angle: definitions, comparisons, pricing, recent developments, expert opinions, alternatives. For example, “What’s the best CRM for small teams?” might become:
  • “best CRM software small business 2026”
  • “CRM comparison small teams pricing”
  • “CRM features for teams under 50”
  • “HubSpot vs Salesforce small business”
  • and several more variants
Only 27% of these sub-queries are stable across repeated searches (iPullRank, 2025). This means content breadth matters more than single-keyword depth. A page that covers multiple angles of a topic has more chances to be retrieved by at least one sub-query.
Google AI Mode fires hundreds of sub-queries for complex “Deep Search” requests using a custom Gemini model specifically designed for query fan-out (Google, May 2025).

Stage 2: Web Retrieval

Each sub-query hits the engine’s internal search backend:
EngineSearch Backend
ChatGPTOpenAI’s own search index (transitioned from Bing in 2024–2025, now primary)
PerplexityProprietary index (built in-house since 2023)
Google AI OverviewGoogle Search index + Knowledge Graph
GeminiGoogle Search index
ClaudeUndisclosed (Anthropic has not publicly confirmed which search provider is used)
This stage retrieves dozens to hundreds of candidate documents across all sub-queries combined. The exact number varies by engine and query complexity. At this point, your page just needs to be findable by one of the sub-queries. Traditional SEO signals (domain authority, backlinks, keyword relevance) still influence this step.

Stage 3: Semantic Ranking and Filtering

The engine runs multi-layer machine learning ranking on all retrieved documents:
  1. Semantic similarity scoring: Documents are compared to the original query using embedding models. Pages with cosine similarity of 0.88+ get 7.3x more citations (Wellows, Dec 2025).
  2. E-E-A-T gate: A binary pass/fail filter. 96% of eventually-cited content passes this gate; below-threshold content is excluded regardless of relevance. Author credentials, schema markup, and domain reputation matter here.
  3. LLM re-ranking: A separate model re-ranks surviving documents at the passage level, not the page level.
After this stage, ~50–100 candidates remain. This is where the pool drops most dramatically.

Stage 4: Passage Extraction

AI engines don’t read entire pages. They extract specific passages of 134–167 words (the optimal cited passage length from large-scale analysis). What determines which passage gets selected:
  • Information density: Factually dense, no preamble or filler
  • Self-containment: The passage answers the sub-query without needing surrounding context
  • Structural signals: Headings, FAQ blocks, comparison tables, and definition boxes signal chunk boundaries to the parser
  • Position on page: 44.2% of all citations come from the first 30% of a page’s content. Content in the bottom 10% earns only 2.4–4.4% of citations (AirOps, 2026)
This is why the Quotability attribute (self-contained answer capsules after each H2) matters so much. It directly addresses how passage extraction works.
The “answer in the first 100 words” pattern (BLUF) is not a writing style preference. 90% of top-cited content follows it because passage extractors evaluate early content first and assign it higher relevance scores.

Stage 5: Prompt Assembly (Pre-Citation Embedding)

This is the most misunderstood stage. Citations are not added after the response is written. They are structurally embedded into the LLM’s prompt before it generates a single token. The orchestration engine assembles:
  • Citation markers linked to specific source URLs
  • Ranked passage excerpts from surviving documents
  • Publication dates and authority metadata
  • Contradiction resolution between conflicting sources
The LLM then generates prose that is architecturally bound to these pre-embedded sources. It cannot invent a URL it was not given.

Stage 6: Response Generation

The final LLM generates the response, constrained by the assembled evidence. Each inline citation number corresponds to a specific passage excerpt from Stage 4. The result: Out of all initially retrieved candidates, only 5–15 are cited in the final response. The vast majority of retrieved content is discarded.

Why This Matters for GEO

Each pipeline stage requires different content qualities:
StageWhat your content needsGEO lever
Fan-outBroad topical coverageCover multiple query angles per page
RetrievalFindability by search enginesTechnical SEO, schema markup
Semantic rankingHigh relevance to queryIntent-aligned headings, direct answers
E-E-A-T gateTrust signalsAuthor credentials, verifiable data, structured data
Passage extractionSelf-contained, information-dense passagesAnswer capsules, BLUF pattern, FAQ blocks
Prompt assemblyFresh, authoritative, non-contradictoryTemporal grounding, entity consistency
A page that’s strong at Stage 2 (good SEO rank) but weak at Stage 4 (no extractable passages) will be retrieved but never cited. This is exactly the gap that GEO strategy addresses.

What Drives AI Citations

Research across 129,000+ domains and 680M+ citations reveals six key factors:
The strongest predictor of AI citations. Brands with deeper, higher-trust backlink profiles are cited far more often. This overlaps with traditional SEO but carries even more weight in GEO.
Experience, Expertise, Authoritativeness, and Trustworthiness. AI systems prioritize content that demonstrates real-world outcomes and verifiable expert credentials.
Cited pages skew newer. Explicit dates (“Last updated: 2026-04-07”) and current statistics signal freshness to AI engines.
Consistent brand name, positioning, and facts across your website, social profiles, directories, and third-party sources reduce ambiguity for AI retrieval.
Independent mentions on Reddit, G2, Trustpilot, and industry publications are disproportionately present in AI-cited sources. Even unlinked brand mentions contribute.
Organization, Product, FAQ, and Review schema markup helps AI engines extract and verify brand facts during retrieval.

What You Can Actually Influence

Of these six factors, some are within your direct control and some require longer-term investment:
FactorYour control levelHow to improve
Domain AuthorityIndirect (long-term)Publish quality content, earn backlinks naturally, fix technical SEO issues in SEO Health
E-E-A-T SignalsHighAdd author credentials, case studies, verifiable data to your pages
Content FreshnessHighKeep pages updated with dates, refresh statistics quarterly
Entity ConsistencyHighAlign brand info across website, profiles, and directories via Brand Settings
Third-Party MentionsLow (and risky)Organic community presence; paid/automated approaches carry significant risks
Structured DataHighAdd schema markup, FAQ blocks, logical heading hierarchy
The factors you can directly control (E-E-A-T, freshness, entity consistency, structured data) are exactly what GEO page optimization targets. This is why content optimization delivers measurable results faster than third-party strategies.

Engine-Specific Behavior

Each AI engine has different source preferences:
EnginePrimary SourcesStrategy
ChatGPTOpenAI search index, Wikipedia heavily weightedStructured content + Wikipedia presence
PerplexityProprietary index, Reddit heavily weighted, real-time searchCommunity engagement + fresh content
Google AI OverviewGoogle organic results (strongly correlated with traditional rankings)Traditional SEO + Schema markup
ClaudeUndisclosed, E-E-A-T signalsFact-based, authoritative content
GeminiGoogle Search indexGoogle SEO + structured data
Optimizing for multiple engines simultaneously is more effective than targeting a single one. Anymorph tracks all engines in parallel so you can identify cross-engine opportunities.

How Anymorph Measures Visibility

PAWC (Position-Adjusted Word Count)

Anymorph uses PAWC, the same metric from the Princeton KDD 2024 research paper, as its core visibility measurement:
positionWeight = exp(-relativePosition / (mentionSpan - 1))
PAWC = wordCount × positionWeight
This means a brand mention at the top of an AI response is weighted exponentially higher than one at the bottom. It’s not just about being mentioned; it’s about where you appear.

GEO Score (0–100)

Your overall AI visibility is expressed as a GEO Score, a weighted composite of six sub-scores:
Sub-scoreWeightWhat it measures
Visibility30%PAWC-based visibility index
SOV Rank20%Share of Voice ranking vs competitors
Sentiment15%Positive vs negative brand mentions
Citation Coverage15%% of citations pointing to your domain
Consistency10%Uniformity across different AI engines
Intent Coverage10%Visibility spread across search intents

Grade Scale

GradeScoreMeaning
A80+Strong AI position. Maintain and expand.
B60–79Good visibility. Target 1-2 areas for optimization.
C40–59Average. Systematic GEO strategy needed.
D20–39Low visibility. Foundational content work needed.
F< 20Nearly invisible to AI. Full strategy review required.

Share of Voice

Anymorph calculates SOV in two ways:
  • PAWC-based SOV: Academically precise, considers mention position and length
  • Mention-based SOV: Intuitive count of brand mentions vs competitors
Both are displayed in the Visibility dashboard to give you complementary perspectives.

Benchmarks

SOV Benchmarks

  • < 5%: Nearly absent from AI responses
  • 5–15%: Present but lacking authority
  • 15–30%: Competitive presence
  • 30%+: Category leader in AI responses

Citation Drift

Monthly citation variation of ~55% is normal. AI engines are non-deterministic, so some fluctuation is expected. Consistent trends over 30+ days are more meaningful than day-to-day changes.

Next Steps

GEO Strategy

Learn the 9 attributes that make content citable by AI.

Visibility Dashboard

See your GEO Score and metrics in action.