- 1.Why Single-Query Tests Are Unreliable
- 2.What Is Multi-Sampling Methodology?
- 3.How Citation Rates Are Calculated
- 4.Applying Multi-Sampling to Contractor GEO
- 5.Tools and Approaches for Multi-Sampling
- 6.What High Citation Rates Actually Require
- 7.Get Your Citation Rate Measured
Why Single-Query Tests Are Unreliable
The most common mistake business owners make when testing AI visibility is asking ChatGPT one question and treating the answer as definitive. AI language models introduce randomness — called temperature — into their outputs by design. The same question asked three times in a row can produce three different recommendations. A single test tells you what the model said once, under one set of conditions, in one session. It does not tell you whether your business is reliably cited, how frequently you appear relative to competitors, or whether your citations are consistent across different AI platforms.
Single-query tests have a confidence interval of essentially zero. Researchers use multi-sampling methodology — running 50 to 200+ identical queries across fresh sessions — to produce statistically meaningful citation rate data.
What Is Multi-Sampling Methodology?
Multi-sampling methodology is a structured research approach used in GEO (Generative Engine Optimization) to measure how consistently an AI system cites a specific business or entity in response to a defined query. Instead of running a query once, researchers run it dozens or hundreds of times — across fresh browser sessions, different times of day, and multiple AI platforms — then aggregate the results to calculate a reliable citation rate.
Session Independence
Each sampling run must begin in a fresh session — no conversation history, no cached context, no prior messages that might bias the model's response. AI systems sometimes use conversation context to shape subsequent answers. Multi-sampling methodology eliminates this by starting each run completely fresh, ensuring that each measurement reflects the model's base knowledge rather than a context-contaminated output.
Temporal Distribution
Citation rates can fluctuate as AI models receive updates, as training data is refreshed, and as competitor content changes. Robust multi-sampling methodology distributes samples across time — not just across sessions. Running 100 samples over a single day gives you a point-in-time reading. Running 20 samples per week over five weeks gives you a trend line. The trend line is far more valuable for GEO strategy because it reveals whether your visibility is improving, plateauing, or declining.
Platform Coverage
The three primary AI platforms with distinct training signals are ChatGPT (OpenAI), Perplexity AI, and Google AI Overviews. Each uses different underlying models, different training corpora, and different retrieval architectures. A business might be cited frequently by Perplexity but rarely by ChatGPT, or vice versa. Complete multi-sampling methodology runs the full query set across all three major platforms and compares citation rates to identify platform-specific visibility gaps.
How Citation Rates Are Calculated
Citation rate is expressed as a percentage: the number of times your business appeared in the AI's response, divided by the total number of query runs. A citation rate of 60% means that in 60 out of 100 fresh sessions, the AI mentioned your business by name in response to the target query. Citation rates below 25% indicate weak or unstable visibility. Rates above 70% indicate strong, reliable visibility. Rates above 85% indicate dominant share-of-model position in that query cluster.
Citation Depth vs. Citation Frequency
Multi-sampling methodology also distinguishes between citation frequency and citation depth. Citation frequency is how often you appear. Citation depth is where and how prominently you appear — are you the first recommendation, the third, or buried in a list of ten? A business cited first 60% of the time has fundamentally different AI visibility than one cited tenth 60% of the time. Depth scoring is part of complete citation rate analysis and directly affects the real-world conversion impact of your AI visibility.
Applying Multi-Sampling to Contractor GEO
For local service contractors, multi-sampling methodology works best when organized around service-location pairs. Rather than running generic queries like 'best contractor,' researchers build query sets organized around specific service-location combinations: 'best roofer in Denver,' 'top HVAC contractor in Denver,' 'licensed plumber Denver,' and so on. This structure allows citation rates to be segmented by service line and geography, revealing exactly where the contractor is strong and where they are invisible.
Sample Size Requirements
For a statistically reliable citation rate, GEO researchers recommend a minimum of 50 samples per query per platform. A 50-sample run at ChatGPT, 50 at Perplexity, and 50 at Google AI Overviews gives you 150 data points for a single query. At this sample size, the margin of error on a 60% citation rate is approximately ±7 percentage points. For high-stakes competitive markets, researchers recommend 100-sample runs per platform to tighten the confidence interval further.
Competitive Benchmarking
The most actionable output of multi-sampling methodology is competitive benchmarking — running the same query set for your top 3 to 5 competitors and comparing citation rates directly. If you have a 30% citation rate and your top competitor has a 75% citation rate on the same query cluster, you have a clear, quantified gap to close. This competitive citation gap analysis is the foundation of effective GEO strategy for contractors.
Tools and Approaches for Multi-Sampling
Manual multi-sampling is time-intensive but accessible. A researcher can run 50 queries in a fresh Incognito window for each session, recording results in a spreadsheet. For larger sample sizes, automated approaches using API access to language models allow programmatic query distribution, response parsing, and citation rate calculation. Market Disruptors uses a hybrid approach: automated sampling for bulk citation rate measurement, and manual review for citation depth analysis and competitive comparison.
The CitationIQ™ platform runs 300+ query samples per client per market at monthly intervals, producing citation rate trends, competitive benchmarks, and platform-specific visibility scores — the data layer that drives every GEO strategy we build.
What High Citation Rates Actually Require
Multi-sampling reveals the gap, but it does not close it automatically. High citation rates are the output of a specific set of inputs: authoritative content structured as answer capsules, technical schema markup that signals entity type and service area, consistent NAP (name, address, phone) data across all citation sources, and third-party validation signals like reviews, directory listings, and media mentions. Contractors who want to improve their citation rates must address all four input categories simultaneously — optimizing only one produces marginal improvement at best.
Get Your Citation Rate Measured
Market Disruptors can run a multi-sampling audit of your current citation rate across your key service-location query pairs. The audit covers ChatGPT, Perplexity, and Google AI Overviews, and benchmarks your citation rate against your top local competitors. Book a free strategy call to see exactly where you stand and what your citation rate would need to be to dominate your market.
Kristina Shrider
National Growth Architect & Behavioral CMO
Kristina is the founder of Market Disruptors Agency and an independent AI marketing researcher. Her published work includes From Automation to Judgment (18 independent citations) and the MAD-M™ governance framework. The GEO methodology and CitationIQ™ measurement platform used across this research library are based on her original work.
View research profile →