Documentation Index
Fetch the complete documentation index at: https://docs.anymorph.ai/llms.txt
Use this file to discover all available pages before exploring further.
How AI Engines Find and Cite Brands
When a user asks an AI assistant a question, the response doesn’t come from a single search. The engine runs a multi-stage pipeline that transforms one question into dozens of searches, retrieves hundreds of candidate pages, then discards 97–98% of them before generating a cited answer. Understanding this pipeline is key to GEO. Each stage has different filtering criteria, and your content needs to survive all of them to get cited.Stage 1: Query Fan-Out
The engine doesn’t search your exact question. It decomposes it into 8–12 parallel sub-queries, each targeting a different angle: definitions, comparisons, pricing, recent developments, expert opinions, alternatives. For example, “What’s the best CRM for small teams?” might become:- “best CRM software small business 2026”
- “CRM comparison small teams pricing”
- “CRM features for teams under 50”
- “HubSpot vs Salesforce small business”
- and several more variants
Google AI Mode fires hundreds of sub-queries for complex “Deep Search” requests using a custom Gemini model specifically designed for query fan-out (Google, May 2025).
Stage 2: Web Retrieval
Each sub-query hits the engine’s internal search backend:| Engine | Search Backend |
|---|---|
| ChatGPT | OpenAI’s own search index (transitioned from Bing in 2024–2025, now primary) |
| Perplexity | Proprietary index (built in-house since 2023) |
| Google AI Overview | Google Search index + Knowledge Graph |
| Gemini | Google Search index |
| Claude | Undisclosed (Anthropic has not publicly confirmed which search provider is used) |
Stage 3: Semantic Ranking and Filtering
The engine runs multi-layer machine learning ranking on all retrieved documents:- Semantic similarity scoring: Documents are compared to the original query using embedding models. Pages with cosine similarity of 0.88+ get 7.3x more citations (Wellows, Dec 2025).
- E-E-A-T gate: A binary pass/fail filter. 96% of eventually-cited content passes this gate; below-threshold content is excluded regardless of relevance. Author credentials, schema markup, and domain reputation matter here.
- LLM re-ranking: A separate model re-ranks surviving documents at the passage level, not the page level.
Stage 4: Passage Extraction
AI engines don’t read entire pages. They extract specific passages of 134–167 words (the optimal cited passage length from large-scale analysis). What determines which passage gets selected:- Information density: Factually dense, no preamble or filler
- Self-containment: The passage answers the sub-query without needing surrounding context
- Structural signals: Headings, FAQ blocks, comparison tables, and definition boxes signal chunk boundaries to the parser
- Position on page: 44.2% of all citations come from the first 30% of a page’s content. Content in the bottom 10% earns only 2.4–4.4% of citations (AirOps, 2026)
Stage 5: Prompt Assembly (Pre-Citation Embedding)
This is the most misunderstood stage. Citations are not added after the response is written. They are structurally embedded into the LLM’s prompt before it generates a single token. The orchestration engine assembles:- Citation markers linked to specific source URLs
- Ranked passage excerpts from surviving documents
- Publication dates and authority metadata
- Contradiction resolution between conflicting sources
Stage 6: Response Generation
The final LLM generates the response, constrained by the assembled evidence. Each inline citation number corresponds to a specific passage excerpt from Stage 4. The result: Out of all initially retrieved candidates, only 5–15 are cited in the final response. The vast majority of retrieved content is discarded.Why This Matters for GEO
Each pipeline stage requires different content qualities:| Stage | What your content needs | GEO lever |
|---|---|---|
| Fan-out | Broad topical coverage | Cover multiple query angles per page |
| Retrieval | Findability by search engines | Technical SEO, schema markup |
| Semantic ranking | High relevance to query | Intent-aligned headings, direct answers |
| E-E-A-T gate | Trust signals | Author credentials, verifiable data, structured data |
| Passage extraction | Self-contained, information-dense passages | Answer capsules, BLUF pattern, FAQ blocks |
| Prompt assembly | Fresh, authoritative, non-contradictory | Temporal grounding, entity consistency |
What Drives AI Citations
Research across 129,000+ domains and 680M+ citations reveals six key factors:1. Referring Domain Authority
1. Referring Domain Authority
2. E-E-A-T Signals
2. E-E-A-T Signals
Experience, Expertise, Authoritativeness, and Trustworthiness. AI systems prioritize content that demonstrates real-world outcomes and verifiable expert credentials.
3. Content Freshness
3. Content Freshness
Cited pages skew newer. Explicit dates (“Last updated: 2026-04-07”) and current statistics signal freshness to AI engines.
4. Entity Consistency
4. Entity Consistency
Consistent brand name, positioning, and facts across your website, social profiles, directories, and third-party sources reduce ambiguity for AI retrieval.
5. Third-Party Mentions
5. Third-Party Mentions
Independent mentions on Reddit, G2, Trustpilot, and industry publications are disproportionately present in AI-cited sources. Even unlinked brand mentions contribute.
6. Structured Data
6. Structured Data
Organization, Product, FAQ, and Review schema markup helps AI engines extract and verify brand facts during retrieval.
What You Can Actually Influence
Of these six factors, some are within your direct control and some require longer-term investment:| Factor | Your control level | How to improve |
|---|---|---|
| Domain Authority | Indirect (long-term) | Publish quality content, earn backlinks naturally, fix technical SEO issues in SEO Health |
| E-E-A-T Signals | High | Add author credentials, case studies, verifiable data to your pages |
| Content Freshness | High | Keep pages updated with dates, refresh statistics quarterly |
| Entity Consistency | High | Align brand info across website, profiles, and directories via Brand Settings |
| Third-Party Mentions | Low (and risky) | Organic community presence; paid/automated approaches carry significant risks |
| Structured Data | High | Add schema markup, FAQ blocks, logical heading hierarchy |
Engine-Specific Behavior
Each AI engine has different source preferences:| Engine | Primary Sources | Strategy |
|---|---|---|
| ChatGPT | OpenAI search index, Wikipedia heavily weighted | Structured content + Wikipedia presence |
| Perplexity | Proprietary index, Reddit heavily weighted, real-time search | Community engagement + fresh content |
| Google AI Overview | Google organic results (strongly correlated with traditional rankings) | Traditional SEO + Schema markup |
| Claude | Undisclosed, E-E-A-T signals | Fact-based, authoritative content |
| Gemini | Google Search index | Google SEO + structured data |
How Anymorph Measures Visibility
PAWC (Position-Adjusted Word Count)
Anymorph uses PAWC, the same metric from the Princeton KDD 2024 research paper, as its core visibility measurement:GEO Score (0–100)
Your overall AI visibility is expressed as a GEO Score, a weighted composite of six sub-scores:| Sub-score | Weight | What it measures |
|---|---|---|
| Visibility | 30% | PAWC-based visibility index |
| SOV Rank | 20% | Share of Voice ranking vs competitors |
| Sentiment | 15% | Positive vs negative brand mentions |
| Citation Coverage | 15% | % of citations pointing to your domain |
| Consistency | 10% | Uniformity across different AI engines |
| Intent Coverage | 10% | Visibility spread across search intents |
Grade Scale
| Grade | Score | Meaning |
|---|---|---|
| A | 80+ | Strong AI position. Maintain and expand. |
| B | 60–79 | Good visibility. Target 1-2 areas for optimization. |
| C | 40–59 | Average. Systematic GEO strategy needed. |
| D | 20–39 | Low visibility. Foundational content work needed. |
| F | < 20 | Nearly invisible to AI. Full strategy review required. |
Share of Voice
Anymorph calculates SOV in two ways:- PAWC-based SOV: Academically precise, considers mention position and length
- Mention-based SOV: Intuitive count of brand mentions vs competitors
Benchmarks
SOV Benchmarks
- < 5%: Nearly absent from AI responses
- 5–15%: Present but lacking authority
- 15–30%: Competitive presence
- 30%+: Category leader in AI responses
Citation Drift
Monthly citation variation of ~55% is normal. AI engines are non-deterministic, so some fluctuation is expected. Consistent trends over 30+ days are more meaningful than day-to-day changes.Next Steps
GEO Strategy
Learn the 9 attributes that make content citable by AI.
Visibility Dashboard
See your GEO Score and metrics in action.