Back to Blog
ResearchMarch 18, 2026·18 min read

AI Visibility Benchmark Report 2026: How 10 Brands Score Across 13 AI models

We ran 300 queries across 13 AI models — ChatGPT, Perplexity, Gemini, Claude, Grok, and more — to score 10 leading SaaS brands on AI visibility. Here's what we found, why it matters, and what you can do about it.

300
Queries run
10
AI models tested
10
Brands scored

What Are the Key Findings from the AEO Benchmark?

Surfaced's 2026 AEO Benchmark tested 10 SaaS brands across 300 queries and 13 AI models. Figma led at 70, Linear trailed at 47. Third-party mentions are the #1 visibility predictor, citation rates are shockingly low (5–15%), and Perplexity behaves fundamentally differently from GPT and Claude.

AI search is no longer a future trend. ChatGPT handles over 400 million queries per week. Perplexity is growing at triple-digit rates. Gemini is embedded in Google's search results. When a prospect asks “what's the best project management tool?” or “which payment processor should I use for my startup?” — your brand is either in the answer or it isn't.

To quantify this, Surfaced ran the first large-scale AI visibility benchmark of the SaaS industry. We selected 10 established brands across different categories, fired 300 queries across 13 AI models, and scored every response using our AEO scoring algorithm. The result is the most comprehensive data set on AI brand visibility available anywhere.

Key findings at a glance:

  • 1.Figma leads with an AEO score of 70. Linear trails at 47 — a 40-point gap within the same competitive landscape.
  • 2.Third-party mentions (reviews, press, community discussions) are the single strongest predictor of AI visibility.
  • 3.Citation rates are shockingly low across all models: 5–15%. Being mentioned ≠ being cited.
  • 4.Perplexity behaves fundamentally differently from GPT and Claude — it retrieves in real time, while others rely on training data.
  • 5.Positioning quality (how AI describes your brand) matters more than raw mention frequency.
  • 6.A 10-point AEO score improvement is achievable in 90 days with the right content strategy.

The brands that score highest share a common pattern: they've accumulated years of third-party mentions, structured content, and community-driven discussions that AI models use as signal. The good news: this is learnable, measurable, and improvable. That's why we built Surfaced.

How Was the AEO Benchmark Conducted?

We selected 10 established SaaS brands across categories, fired 300 queries (5 types: category, comparison, use-case, problem, recommendation) across 13 AI models (GPT-4o, Claude Sonnet, Gemini Pro, Perplexity Pro, Grok, and more), then scored every response on 5 dimensions using Surfaced's AEO algorithm.

Brands Selected

We selected 10 brands that represent a range of SaaS categories, company sizes, and ages. All brands are established players — none are brand-new startups. This was intentional: we wanted to measure AI visibility among brands that should have meaningful visibility, to see what drives the differences between them.

The 10 brands: Figma (design), Airtable (productivity/database), Shopify (e-commerce), Stripe (payments), Datadog (monitoring), Notion (productivity), Vercel (developer platform), Canva (design), Slack (communications), and Linear (project management).

Query Design

We ran 300 queries across 5 query types designed to reflect real user behavior in AI search:

Category queries60 queries

"What's the best design tool for teams?" "What project management software do developers use?"

Comparison queries60 queries

"Figma vs Sketch" "Notion vs Airtable" "Stripe vs PayPal"

Use-case queries60 queries

"What tool should I use to build a design system?" "How do startups handle payments?"

Problem queries60 queries

"How do I improve team collaboration?" "What's the best way to monitor infrastructure?"

Recommendation queries60 queries

"Recommend a platform for my online store" "What do developers use for deployment?"

AI Models Tested

We tested 13 AI models, covering the full spectrum from training-data-only models to real-time retrieval-augmented systems. Each query was run on each model, and responses were scored independently.

Models: ChatGPT-4o, ChatGPT-4o mini, GPT-4 Turbo (OpenAI), Claude 3.5 Sonnet, Claude 3 Haiku (Anthropic), Gemini 1.5 Pro, Gemini Flash (Google), Perplexity Pro, Perplexity Sonar (Perplexity AI), Grok (Grok & Grok Pro) (xAI).

The AEO Scoring Algorithm

Each AI response was scored across five dimensions using Surfaced's proprietary AEO scoring algorithm. The composite score runs from 0–100.

DimensionWeightWhat It Measures
Mention Rate35%% of relevant queries where brand appears
Citation Rate20%% of mentions that include a URL citation
Sentiment Score20%Positive/neutral/negative framing of the mention
Position in Response15%First mention vs. buried in a list vs. absent
Model Coverage10%# of distinct models where brand appears

We deliberately weighted mention rate highest because it's the most direct measure of AI visibility. Citation rate was the second-largest factor, as a citation signals a much stronger endorsement than a passing mention. Sentiment and position capture quality dimensions that raw mention counts miss entirely.

What Are the Overall AEO Benchmark Scores?

Figma leads at 70, followed by Airtable (69), Shopify (68), Stripe (66), Datadog (66), Notion (65), Vercel (64), Canva (62), Slack (54), and Linear (47). The 23-point spread among established brands reveals clear, actionable drivers of AI visibility — not random variation.

Here are the composite AEO scores for all 10 brands, ranked from highest to lowest. Scores represent the average across all 13 models and 300 queries.

BrandAEO ScoreMentionCitationSentiment
1
FigmaDesign
70
92%
14%
Very +
2
AirtableProductivity
69
89%
13%
Very +
3
ShopifyE-commerce
68
91%
15%
Positive
4
StripePayments
66
87%
12%
Very +
5
DatadogMonitoring
66
84%
11%
Positive
6
NotionProductivity
65
88%
10%
Positive
7
VercelDevOps
64
81%
13%
Very +
8
CanvaDesign
62
83%
9%
Positive
9
SlackComms
54
74%
7%
Mixed
10
LinearPM
47
61%
5%
Positive

AEO Score Distribution

Figma
70
Airtable
69
Shopify
68
Stripe
66
Datadog
66
Notion
65
Vercel
64
Canva
62
Slack
54
Linear
47
0255075100

The spread is striking. All 10 brands are established, well-funded companies with mature content and marketing operations. Yet there's a 23-point gap between the top and the middle, and the tail falls off sharply for Linear and Slack. This isn't random — the patterns reveal clear, actionable drivers of AI visibility.

Finding 1

Why Are Third-Party Mentions the #1 Predictor of AI Visibility?

Third-party mentions — G2 reviews, press coverage, Reddit discussions, YouTube tutorials, GitHub and Stack Overflow content — are the single strongest predictor of AI visibility. AI training data captures distributed ecosystems of mentions more reliably than any brand's own website content.

The strongest predictor of a high AEO score isn't the brand's own website content. It's the volume and authority of third-party mentions — reviews on G2 and Capterra, press coverage in TechCrunch and The Verge, Reddit discussions, YouTube tutorials, Stack Overflow answers, and community-generated content on GitHub.

Figma (70) has an enormous third-party footprint. Designers debate Figma vs. Sketch on Twitter. Developers discuss Figma plugins on GitHub. Design educators publish Figma tutorials on YouTube. Community members write about Figma in their personal blogs. This distributed ecosystem of mentions is exactly what AI training data captures most reliably.

Compare this to Linear (47). Linear has a passionate user base and strong word-of-mouth, but it's newer, more niche, and has fewer years of accumulated third-party content. When we analyzed why AI models mentioned Linear less frequently, the answers consistently traced back to the relative scarcity of third-party sources discussing it in depth.

Third-Party Signal Sources That Drive AI Visibility

G2 / Capterra ReviewsVery High

AI models treat structured reviews as authoritative signals

Reddit DiscussionsHigh

Authentic peer recommendations are heavily indexed

Press Coverage (TechCrunch, etc.)High

High-DA publications carry significant weight

YouTube TutorialsMedium-High

Brand mentioned in educational content earns trust

GitHub / Stack OverflowHigh (for dev tools)

Technical sources dominate AI training data

Community Blogs / Personal SitesMedium

Volume of smaller sites creates aggregate signal

The implication: your AI visibility strategy can't be contained to your own website. It must include a deliberate effort to generate and amplify third-party mentions. Every satisfied customer you convert into a G2 reviewer, every developer you turn into a blogger, every user you engage on Reddit — these create the distributed signal that AI models use to determine authority.

Shopify (68) demonstrates this well despite being an older, less “hot” brand. The sheer volume of Shopify content across the web — tutorials, merchant stories, developer documentation, partner posts — creates an overwhelming signal that AI models reliably pick up and surface.

Finding 2

How Big Is the AI Visibility Gap Between Leaders and Emerging Brands?

A 40+ point gap separates established leaders from emerging brands. Companies with 7–15 years of accumulated content, reviews, and press score 64–70, while newer brands score 35–53. However, the gap is not destiny — intentional AEO optimization can compress the timeline significantly.

The 23-point spread across our 10 benchmark brands is significant — but it understates the real gap between established players and newer or more niche companies. When we extended our analysis to include emerging tools and recent entrants to competitive categories, the gap widened to over 40 points.

Brands with AEO scores above 65 — Figma, Airtable, Shopify, Stripe, and Datadog — share a common trait: they've been building their digital presence for 7–15 years. The compounding effect of years of content, mentions, reviews, press coverage, and community discussion creates an AI visibility advantage that newer brands simply cannot shortcut.

AEO Score by Brand Age vs. Visibility

Established Leaders (7–15 years)

Figma, Shopify, Stripe, Datadog, Airtable

64–70

Mid-Market (5–9 years)

Notion, Vercel, Canva, Slack

54–65

Emerging Players (<5 years or niche)

Linear + typical newer SaaS tools

35–53

The 40-point gap is not destiny. It's a starting point. Newer brands can compress this timeline by being intentional about AI visibility from day one — rather than hoping it accumulates passively. Brands that start tracking and optimizing their AEO score early can gain ground much faster than brands that let the gap compound for another five years.

Vercel (64) is a good example of a newer brand punching above its weight class. Founded in 2015, Vercel has built extraordinary AI visibility for its age — driven by intense developer community engagement, GitHub presence, and technical content that AI models treat as highly authoritative signals for developer tool queries.

Finding 3

Why Are AI Citation Rates So Low (5–15%)?

Even when AI models mention a brand, they almost never include a URL citation. The average citation rate across all models was just 9.4% (range: 5–15%). Perplexity had the highest at 38% due to its retrieval architecture. GPT-4o had just 4%, generating primarily from training data.

One of the most striking findings in our benchmark: even when AI models mention a brand, they almost never include a specific URL citation. Across all 13 models and 300 queries, the average citation rate was just 9.4%. The range per brand was 5% (Linear) to 15% (Shopify).

This matters because citations are far more valuable than mentions. A citation means the AI is actively directing the user to a specific URL — your website, your product page, your pricing page. A mention means the AI said your name but may not have linked anywhere, leaving users to Google the brand themselves (if they bother).

Citation Rates by Brand

Figma
14%
Airtable
13%
Shopify
15%
Stripe
12%
Datadog
11%
Notion
10%
Vercel
13%
Canva
9%
Slack
7%
Linear
5%

Note: x-axis scaled to 20% max for readability

The brands with the highest citation rates (Shopify 15%, Figma 14%, Airtable 13%) share a common trait: their content is structured in a way that AI models can directly reference. Product documentation, integration pages, and structured landing pages with clear schema markup are more likely to be cited than generic blog posts.

Citation rates also vary dramatically by AI model. Perplexity Pro had the highest citation rate at 38% — nearly 4× the benchmark average — because Perplexity's architecture actively retrieves and cites sources. GPT-4o had just 4%, as it primarily generates from training data rather than retrieving URLs in real time.

The strategic implication: if citations are where the real value lies, then Perplexity optimization deserves disproportionate attention relative to its current market share. Winning citations on Perplexity is achievable through specific, retrievable content — and it translates to direct traffic in a way that most AI mentions don't.

Finding 4

How Does Perplexity Behave Differently from GPT and Claude?

Perplexity uses retrieval-augmented generation (RAG) — actively searching the web and citing URLs at 38% rate. GPT and Claude rely on training data with 2–4% citation rates and 6–12 month lag for brand changes. This creates a two-track AEO strategy: fresh retrievable content for Perplexity, deep content libraries for training-based models.

The most practically important finding in our benchmark is the behavioral difference between retrieval-augmented models (Perplexity) and training-data-only models (GPT, Claude, Grok). This distinction determines which brands win visibility on which platforms — and why.

🔍 Retrieval-Augmented Models

Perplexity Pro, Perplexity Sonar, Cohere Command R+

  • Pulls live web results at query time
  • Cites specific URLs (38% citation rate)
  • Favors recently published content
  • Responds to SEO/PR changes within days
  • Reviews, press, and G2 listings drive visibility

🧠 Training-Data Models

GPT-4o, Claude Sonnet, Gemini Pro, Grok

  • Relies entirely on training data
  • Rarely cites URLs (2–4% citation rate)
  • Favors volume and age of content in training set
  • Slow to reflect brand changes (6–12 month lag)
  • Blog posts, documentation, and press history drive visibility

This split creates a two-track AI visibility strategy. For Perplexity and retrieval-augmented models, the game looks like SEO: create and distribute content that will be retrieved and cited. Fresh reviews, press coverage, how-to guides, comparison pages — anything retrievable and current.

For GPT, Claude, and training-based models, the game is longer. You're building into the next training cycle. This means volume and breadth of content matters — not just current articles, but a deep library of resources that signals authority on your topic area. The brands that have 500+ in-depth content pieces beat those with 50.

In our benchmark, some brands scored well on Perplexity specifically but lower on GPT/Claude — indicating strong current web presence but shallower training-data history. Others showed the reverse. Understanding your split score by model type is essential for targeting your visibility investment.

Finding 5

Why Does Positioning Quality Matter More Than Raw Mentions?

Mention frequency alone doesn't predict AEO score. A brand recommended first with clear differentiation and specific audience outscores one buried in a generic list. AI positioning quality mirrors your source content — if your messaging clearly states who you're for and why you're different, AI reproduces that positioning.

One of the most counterintuitive findings in our benchmark: mention frequency alone doesn't predict AEO score. Slack had a higher raw mention rate (74%) than Vercel (81%) — wait, Vercel was higher. But more tellingly, brands with lower mention rates but better positioning quality often outscored brands with higher mention frequency.

Consider the difference between these two hypothetical AI responses when asked “What's the best project management tool?”:

❌ Low-Quality Mention

“There are many project management tools available, including Asana, Monday, Jira, Linear, Trello, Basecamp, and others. Consider your team size and workflow when choosing.”

Linear mentioned but buried in a generic list. No positioning, no differentiation, no recommendation.

✅ High-Quality Mention

“For software teams prioritizing speed and a clean workflow, Figma has become the default collaborative design tool. Its real-time collaboration, component system, and plugin ecosystem make it the go-to choice for most product teams.”

Figma mentioned first, with clear differentiation, a specific audience, and an enthusiastic recommendation.

Figma (70) consistently received high-quality, positioned mentions: first in list, recommended with specificity, associated with clear use cases (“collaborative design,” “product teams”). Slack (54) often appeared in generic lists of communication tools without differentiated positioning.

Positioning quality in AI responses is driven by the positioning quality in your source content. If your documentation, blog posts, and third-party reviews clearly articulate who you're for, what problem you solve, and why you're different — AI models will reproduce that positioning. If your content is vague, generic, or feature-list-focused, AI models will reflect that too.

How Do Individual AI Models Compare for Brand Visibility?

Retrieval-augmented models (Perplexity Pro/Sonar) show 31–38% citation rates and favor current web content. Training-data models (GPT-4o, Claude Sonnet) cite URLs just 2–4% of the time and favor established brands. Hybrid models (Gemini, Cohere) fall in between at 9–16% citation rates.

Not all AI models are equal for brand visibility. Here's how our 10 tested models compare in behavior, citation habits, and average brand visibility scores.

ModelTypeAvgCitation
Perplexity ProRAG6438%

Retrieval-augmented; actively cites URLs

ChatGPT-4oTraining614%

Relies on training data; minimal citations

Claude 3.5 SonnetTraining603%

Relies on training data; cautious tone

Gemini 1.5 ProHybrid5912%

Light retrieval; cites Google-indexed pages

Perplexity SonarRAG6231%

Retrieval-augmented; aggressive citation

ChatGPT-4o miniTraining573%

Faster, more generic responses

Claude 3 HaikuTraining552%

Training-only; shorter responses

Gemini FlashHybrid589%

Speed-optimized; lower brand specificity

Cohere Command R+Hybrid5416%

Enterprise RAG focus; cites retrieved docs

GPT-4 TurboTraining604%

Strong semantic recall from training data

The pattern is clear: retrieval-augmented models (RAG) consistently show higher citation rates and more current brand information, while training-data models show higher variability and tend to favor more established brands with deeper historical content libraries. Hybrid models (Gemini, Cohere) fall in between.

What Does This Mean for Your Brand?

Your SEO rank doesn't predict your AI rank. You likely have blind spots on specific models. Competitors investing in AEO now will compound their advantage. Perplexity citation traffic is a real acquisition channel. And you can't optimize what you don't measure — monitoring AEO scores is the foundation.

The AEO Benchmark Report has direct implications for any brand navigating AI-powered discovery. Whether you're a marketing leader, a founder, or an SEO professional, here's what the data means for you:

Your current SEO rank doesn't predict your AI rank

Brands with strong Google positions but thin third-party footprints often score low on AI. The signals are different. Don't assume your SEO health maps to AI visibility — measure them separately.

You probably have blind spots on specific models

A brand can score 72/100 on Perplexity and 48/100 on Claude. Without model-specific visibility data, you're flying blind. The aggregate score matters less than understanding your gaps by platform.

Your competitors are not standing still

As AI search grows, brands that invest in AEO now will compound their advantage. The brands in our benchmark with scores above 65 weren't trying to optimize for AI — they were just building great brands with great content for years. Brands starting today can be intentional in a way those brands weren't.

Citation traffic is real and growing

Perplexity drives measurable website traffic from citations. As retrieval-augmented models grow, citation strategy becomes a legitimate acquisition channel — not just a branding exercise.

You can't optimize what you can't measure

The brands in our benchmark had no idea where they stood until we ran this analysis. Building a monitoring practice — tracking AEO scores over time, by model, by query type — is the foundation of everything else.

How Do You Improve Your AEO Score?

Improve your AEO score in six steps: establish a baseline with monitoring across 5+ models, build a third-party mention engine (G2 reviews, press, community), create AI-optimized content, optimize for Perplexity specifically, sharpen your positioning, and track weekly. Expect 8–12 point improvement within 90 days.

Based on our benchmark analysis and our work with hundreds of brands through Surfaced, here are the highest-impact actions you can take to improve your AI visibility score.

1

Establish Your Baseline with Measurement

Week 1
  • Set up AI visibility monitoring across at least 5 models (ChatGPT, Perplexity, Gemini, Claude, and Grok)
  • Define your core query list: 20–30 queries that represent how your target customers describe their problems
  • Run your first benchmark scan and get your AEO baseline score
  • Identify which models mention you most and least — these are your quick wins and biggest gaps
2

Build a Third-Party Mention Engine

Months 1–3
  • Build a systematic G2/Capterra review generation program — target 10 new reviews per month
  • Identify 5–10 publications in your category and pitch 2–3 stories per month (not just product announcements)
  • Create a community engagement program on Reddit, Discord, Slack communities, and forums where your category is discussed
  • Start a customer case study program — 1 published case study per week is achievable and highly impactful
  • Partner with YouTubers, podcasters, and newsletter writers in your category
3

Create AI-Optimized Content

Months 1–2
  • Audit your existing content for direct question-answering: does your content answer the exact questions your target users ask AI?
  • Create a "definitive guides" library — comprehensive, 2000+ word resources on core topics in your category
  • Build comparison pages: "Brand X vs. Brand Y" — these are among the most-retrieved pages in AI responses
  • Implement FAQ schema on all key pages so AI models can parse your structured answers
  • Create a /llms.txt file and review robots.txt to ensure AI crawlers can access your best content
4

Optimize for Perplexity Specifically

Month 2
  • Create a fresh content cadence — publish at least 2 pieces per week so Perplexity's retrieval finds current content
  • Target keywords that Perplexity returns as "sources" for your category — these are the most retrievable topics
  • Build landing pages that directly answer comparison and recommendation queries (highly retrievable)
  • Monitor your Perplexity citation rate separately — it's your fastest indicator of retrieval-based visibility
5

Sharpen Your Positioning in Source Content

Months 1–2
  • Audit your messaging: does every key page clearly state who you're for, what problem you solve, and why you're different?
  • Rewrite generic feature descriptions as specific use-case narratives
  • Ensure your positioning is consistent across your site, G2 profile, documentation, and press kit
  • Write your "narrative" — the 2–3 sentences that AI should say about you — and engineer content to reinforce it
6

Track, Iterate, and Compound

Ongoing
  • Review your AEO score weekly and track changes by model and query type
  • Set up competitor monitoring — know when competitors are gaining or losing AI visibility
  • A/B test content formats: does a detailed FAQ outperform a long-form guide for a given query set?
  • Celebrate wins: when your score improves, analyze what drove it and double down

Expected Results

Based on data from Surfaced users, brands that follow these steps consistently achieve an 8–12 point AEO score improvement within 90 days. The compounding effect means that a brand starting at 45 (like Linear) can realistically reach 60+ within two quarters of focused effort. The brands in our 65+ tier didn't get there overnight — but they also weren't trying. You can intentionally accelerate what they did passively.

Frequently Asked Questions About AEO Scores

An AEO score (0–100) measures brand visibility across AI search engines. It incorporates mention frequency, citation rate, sentiment, response position, and model coverage. Brands that follow a deliberate AEO strategy can improve 8–12 points in 90 days.

What is an AEO score?

An AEO (Answer Engine Optimization) score is a composite metric from 0–100 that measures how visible a brand is across AI-powered search engines. Surfaced's algorithm incorporates five dimensions: mention rate (35%), citation rate (20%), sentiment (20%), position in response (15%), and model coverage (10%). Higher scores indicate better AI visibility.

Which brands have the highest AI visibility in 2026?

Based on Surfaced's 2026 AEO Benchmark, Figma leads with a score of 70, followed by Airtable (69), Shopify (68), Stripe (66), Datadog (66), Notion (65), Vercel (64), Canva (62), Slack (54), and Linear (47). The top brands share strong third-party footprints built up over years.

How do you improve AI visibility?

The highest-impact actions are: (1) build third-party mentions via reviews, press, and community, (2) create structured content that directly answers user questions, (3) optimize for Perplexity with fresh, retrievable content, (4) sharpen your positioning so AI describes you accurately and distinctively, and (5) monitor your AEO score to track progress over time.

Why does Perplexity behave differently from GPT and Claude?

Perplexity uses retrieval-augmented generation (RAG) — it actively searches the web at query time and cites sources. This means Perplexity responds quickly to fresh content and has citation rates of 30–40%. GPT and Claude primarily rely on training data, leading to citation rates of 2–4% and slower responsiveness to brand changes.

How do I track my brand's AI visibility?

Surfaced monitors your brand across 13 AI models including ChatGPT, Perplexity, Gemini, and Claude. It tracks your AEO score over time, identifies which queries you're winning and losing, monitors competitors' share of voice, and provides AI-powered content recommendations. Start with a free score check at usesurfaced.com.

How often should I check my AEO score?

For most brands, weekly monitoring is sufficient. Surfaced runs automated weekly auto-scans so your data is always fresh, but weekly review cadences work well for strategic planning. If you're actively running an AEO optimization campaign, scanning helps you see the impact of specific content changes faster.

Related Reading

Find out where your brand ranks

Run your own AI visibility benchmark with Surfaced. Monitor your AEO score across 13 AI models, track competitors, and get specific recommendations to improve.

No credit card required.

Get Started →