On this page
Basics
Data & access
Analysis & methodology
Visualization & UMAP
Outputs & fixes
Technical capabilities
Pricing & beta
Want help implementing the fixes? Request a walkthrough .
Basics
Topical drift 101
Topical drift is when a page slowly stops representing the topic and intent it used to win for.
It often happens after months of edits, new sections, internal linking changes, or “helpful” expansions that pull the page off-center.
We measure drift using semantic distance — lower means tighter topical alignment; higher means more drift.
Content decay is typically about freshness, SERP changes, stronger competitors, or better alternatives showing up.
Drift is about meaning mismatch: the page content, the internal link context pointing at it, and the queries it actually attracts
stop agreeing. Many pages “decay” because they first drift. Our report shows you where meaning is off and what to fix.
It’s when an internal link’s meaning in context doesn’t match the destination page’s meaning.
Most tools stop at anchor text. We analyze the anchor + surrounding text + the container/heading context to estimate what the link is “claiming,”
then compare that to the destination page embedding. High mismatch can send confusing topical signals.
Actual distances (default) are cosine distances computed from embeddings
(cosine distance ranges from 0–2 in theory; in most same-site content comparisons it often falls closer to ~0–1).
We use these for trend tracking over time because the scale is stable across scans.
Normalized distances rescale a site’s distances into a 0–1 range for easier relative comparison within a site (and for composite scoring). Normalized mode is helpful when you want percentile-style zones or you’re weighting multiple signals (like links + engagement).
Normalized distances rescale a site’s distances into a 0–1 range for easier relative comparison within a site (and for composite scoring). Normalized mode is helpful when you want percentile-style zones or you’re weighting multiple signals (like links + engagement).
Data & access
What we use, what we store, and what we don’t
We use your sitemap URL inventory, cleaned main page content (headings + body text),
internal link context (anchor + surrounding text + container/heading context),
and (optionally) Google Search Console performance to ground the analysis in real query intent.
We generate embeddings from the cleaned text and use UMAP to visualize semantic neighborhoods.
The scan starts from your sitemap. In most cases that’s enough because it reflects your intended indexable inventory.
If you have important pages not in the sitemap, add them (recommended) or include them via an upload/override in the app.
We fetch HTML, extract main content, and generate embeddings for each page.
Read-only access for the property you want to analyze. We use clicks/impressions/position signals to
identify pages whose query reality is drifting away from what their content would suggest they should attract.
GSC is optional, but it makes drift detection and prioritization far more accurate.
We store analysis results and derived representations (e.g., embeddings, scores, aggregates, and report outputs).
We avoid storing raw HTML whenever possible. Some features (like “placement snippets”) may store small extracted excerpts
needed to render the report (not full pages). You can delete analysis data at any time. We do not sell or share your data.
Analysis & methodology
How the drift signal is computed
Embeddings are vectors that represent semantic meaning. We use OpenAI’s
text-embedding-3-small to create embeddings and then:
- Measure page-to-page similarity (cosine similarity) and distance (cosine distance)
- Group pages into semantic clusters (e.g., k-means on embeddings)
- Project embeddings to 2D for visualization (UMAP)
- Compare internal link context vs destination page meaning
- Calculate distance from your topical “center” (a centroid/center-of-mass embedding)
- UMAP is used for visualization; similarity, distance, and SDI are computed in the original embedding space.
Raw HTML includes nav, footers, widgets, “related posts,” and template repetition.
Embedding all of that makes vectors represent your template, not your topic.
Main-content extraction keeps what matters (headings H1–H6, body text, lists, core containers) and reduces boilerplate.
That makes clustering cleaner and drift detection more precise.
SDI is a composite score designed for prioritization. A common default is:
60% semantic distance (topical alignment)
30% link penalty (internal linking weakness / authority proxy)
10% engagement penalty (GSC performance signals)
You can customize these weights in the visualization depending on your workflow (pure content audit, link-first fixes, etc.). SDI is most useful in normalized mode, where scores are comparable within a site.
60% semantic distance (topical alignment)
30% link penalty (internal linking weakness / authority proxy)
10% engagement penalty (GSC performance signals)
You can customize these weights in the visualization depending on your workflow (pure content audit, link-first fixes, etc.). SDI is most useful in normalized mode, where scores are comparable within a site.
Drift usually changes after meaningful updates — publishing new pages, editing sections, restructuring internal links, or consolidating content —
not hour-to-hour. Daily scans tend to add noise without changing decisions.
Also, Google Search Console metrics are delayed and can fluctuate day-to-day, so “daily drift dashboards” often create false alarms. Weekly or monthly scans create a stable rhythm that’s easier to act on.
With actual distance mode, you can track whether your fixes reduce drift over time: lower distances = tighter topical focus. If you’re in an active optimization sprint, run a quick “before / after” scan — otherwise monthly is the sweet spot.
Also, Google Search Console metrics are delayed and can fluctuate day-to-day, so “daily drift dashboards” often create false alarms. Weekly or monthly scans create a stable rhythm that’s easier to act on.
With actual distance mode, you can track whether your fixes reduce drift over time: lower distances = tighter topical focus. If you’re in an active optimization sprint, run a quick “before / after” scan — otherwise monthly is the sweet spot.
In Actual Distance mode (default):
• Core: ≤ 0.300 (excellent topical alignment)
• Focus: 0.300–0.500 (strong alignment, generally “on topic”)
• Expansion: 0.500–0.700 (moderate drift — review recommended)
• Peripheral: ≥ 0.700 (significant drift — needs attention)
These are default heuristics and can be tuned as we calibrate across different site types and content mixes.
In Normalized mode: Zones are percentile-based (0–0.3 = best 30%, etc.) for relative comparison within your site.
• Core: ≤ 0.300 (excellent topical alignment)
• Focus: 0.300–0.500 (strong alignment, generally “on topic”)
• Expansion: 0.500–0.700 (moderate drift — review recommended)
• Peripheral: ≥ 0.700 (significant drift — needs attention)
These are default heuristics and can be tuned as we calibrate across different site types and content mixes.
In Normalized mode: Zones are percentile-based (0–0.3 = best 30%, etc.) for relative comparison within your site.
Visualization & UMAP
Interactive radial map and semantic projection
UMAP (Uniform Manifold Approximation and Projection) reduces high-dimensional embeddings to 2D while preserving semantic neighborhoods.
We use those 2D coordinates to visualize relationships and to derive a consistent “around-the-circle” semantic ordering (and we can stabilize it with fixed seeds).
Pages positioned near each other represent similar meaning — which often indicates linking opportunities and shared topical intent.
Radius (distance from center): Shows drift. Pages farther from your topical center are more off-topic (higher distance).
You can toggle between actual distances (stable across scans) or normalized (0–1 for relative ranking).
Angle (position around the circle): Uses semantic projection by default (derived from UMAP’s 2D coordinates), so pages that are similar in meaning sit near each other around the circle. You can also switch the angle mode to group by cluster, distribute evenly, or group by problem type.
Angle (position around the circle): Uses semantic projection by default (derived from UMAP’s 2D coordinates), so pages that are similar in meaning sit near each other around the circle. You can also switch the angle mode to group by cluster, distribute evenly, or group by problem type.
Linking opportunities are pairs of pages that are semantically close but don’t currently link.
The tool surfaces high-confidence candidates using proximity thresholds (tunable) plus intent and context signals.
Adding contextual internal links between close neighbors helps search engines understand your structure and can strengthen topical authority.
Yes. You can:
• Toggle between actual/normalized distances
• Choose angle mode (semantic projection, by cluster, uniform, by problem type)
• Color by cluster, zone, SDI score, or severity
• Adjust node size based on traffic
• Show/hide labels (none/all/drifting only)
• Filter by cluster
• Zoom, pan, and click nodes to open pages
• Export as PNG for sharing
• Toggle between actual/normalized distances
• Choose angle mode (semantic projection, by cluster, uniform, by problem type)
• Color by cluster, zone, SDI score, or severity
• Adjust node size based on traffic
• Show/hide labels (none/all/drifting only)
• Filter by cluster
• Zoom, pan, and click nodes to open pages
• Export as PNG for sharing
Node size: By default, mapped to traffic (GSC clicks). Larger nodes = more traffic.
Node opacity: By default, mapped to internal link strength (e.g., inlinks / authority proxy). Brighter nodes are better connected; faint nodes are weakly linked (often linking opportunities or near-orphans). Both mappings are configurable in the UI.
Node opacity: By default, mapped to internal link strength (e.g., inlinks / authority proxy). Brighter nodes are better connected; faint nodes are weakly linked (often linking opportunities or near-orphans). Both mappings are configurable in the UI.
No — UMAP only affects the layout of the visualization (where points appear on the 2D map).
Drift scoring uses cosine distance computed in the original embedding space.
Outputs & fixes
What you get — and what to do with it
You get:
• Drift report with severity rankings
• Semantic clusters and hub structure
• Interactive radial map with semantic projection (UMAP) + distance zones
• Internal link meaning mismatch list
• Linking opportunities (similar pages that should link)
• Zone distribution dashboard
• CSV exports for the main datasets
• Prioritized checklist / action plan
• Drift report with severity rankings
• Semantic clusters and hub structure
• Interactive radial map with semantic projection (UMAP) + distance zones
• Internal link meaning mismatch list
• Linking opportunities (similar pages that should link)
• Zone distribution dashboard
• CSV exports for the main datasets
• Prioritized checklist / action plan
The fastest wins are usually:
• Adjusting link context (anchor + nearby text) to match destination pages
• Moving links into better semantic sections (where the surrounding paragraph supports the target)
• Re-centering headings/content to match intent
• Adding internal links between semantically close pages (from linking opportunities)
• Consolidating overlapping/cannibalizing pages (merge, redirect, or re-scope)
• Strengthening hub pages (missing entities/sections aligned to the winning cluster)
• Reviewing peripheral pages (high distance) for consolidation, re-scope, noindex, or redirect — depending on intent and value
• Adjusting link context (anchor + nearby text) to match destination pages
• Moving links into better semantic sections (where the surrounding paragraph supports the target)
• Re-centering headings/content to match intent
• Adding internal links between semantically close pages (from linking opportunities)
• Consolidating overlapping/cannibalizing pages (merge, redirect, or re-scope)
• Strengthening hub pages (missing entities/sections aligned to the winning cluster)
• Reviewing peripheral pages (high distance) for consolidation, re-scope, noindex, or redirect — depending on intent and value
With actual distance mode, you can track whether your fixes reduce drift:
• Run a baseline scan → note distances for problematic pages
• Implement fixes (content updates, link changes)
• Run a follow-up scan (monthly is typical)
• Compare distances and zone movement over time
Lower distances = tighter topical focus. For example, moving from 0.650 to 0.480 indicates measurable improvement in topical alignment. The zone distribution dashboard shows how many pages are in each zone over time.
• Run a baseline scan → note distances for problematic pages
• Implement fixes (content updates, link changes)
• Run a follow-up scan (monthly is typical)
• Compare distances and zone movement over time
Lower distances = tighter topical focus. For example, moving from 0.650 to 0.480 indicates measurable improvement in topical alignment. The zone distribution dashboard shows how many pages are in each zone over time.
Yes — if you want, I can review your report and turn it into a concrete implementation plan (or handle execution).
That includes prioritization, rewriting link context where needed, identifying consolidation opportunities,
and mapping linking strategies based on semantic neighborhoods.
Contact me.
Technical capabilities
Under the hood
We use the same embedding model and configuration within a scan so comparisons are valid. If we change the embedding
model/version in the future, we’ll label that so historical comparisons stay honest.
We cluster page embeddings into topic groups (commonly using k-means).
The goal is to reveal hub structure: which pages form cohesive neighborhoods, which pages are off-cluster, and which pages behave like bridges.
The report uses clusters to summarize your content architecture and prioritize fixes by group.
We use UMAP because it’s fast and does a strong job preserving semantic neighborhoods at scale (hundreds of pages).
t-SNE can be slow and can distort global structure; PCA is fast but often collapses meaningful local similarity.
UMAP gives a practical balance for “what should link to what?” visualization.
Yes. You can export CSV for core datasets like:
• Page drift data (URL, distance, zone, cluster, metrics)
• Cluster membership and statistics
• Internal link meaning mismatches with similarity scores
• UMAP coordinates / semantic positioning
• Linking opportunities
You can also view the report JSON structure for programmatic use.
• Page drift data (URL, distance, zone, cluster, metrics)
• Cluster membership and statistics
• Internal link meaning mismatches with similarity scores
• UMAP coordinates / semantic positioning
• Linking opportunities
You can also view the report JSON structure for programmatic use.
Mostly yes (and we can make it reproducible):
• Embeddings are deterministic for the same input/model
• Similarity and distance calculations are deterministic
• Clustering and UMAP can vary slightly unless we fix random seeds
With fixed seeds, repeated scans on unchanged content produce the same cluster structure and a stable map layout. Without seeds, results remain directionally consistent, but you may see small layout or assignment changes.
• Embeddings are deterministic for the same input/model
• Similarity and distance calculations are deterministic
• Clustering and UMAP can vary slightly unless we fix random seeds
With fixed seeds, repeated scans on unchanged content produce the same cluster structure and a stable map layout. Without seeds, results remain directionally consistent, but you may see small layout or assignment changes.
Pricing & beta
Plans, limits, and what happens after beta
Beta scans are free with unlimited scans (fair use) and up to 500 pages per site.
You get clustering, the interactive radial map with semantic projection (UMAP), linking opportunities, exports, and an action plan.
No credit card required.
You can run scans whenever you need them — no artificial monthly limits.
“Fair use” means reasonable usage for legitimate site analysis (not abuse or automated mass-rescanning).
Many teams run 1–2 scans per site per month, but you can scan more often while iterating on fixes.
Yes. You can add extra pages as add-on blocks or scan large sites in phases (e.g., blog section first, then product/service pages).
For agencies managing larger portfolios, we’ll offer higher page limits and multi-site plans after beta.
You’ll be able to keep scanning on a monthly subscription plan.
Plans will be based on number of sites and page limits per site.
Beta users will receive early-access pricing. Core features (visualization, linking opportunities, exports, action plans) remain available.
Yes. During beta you can scan multiple sites (up to 500 pages each, unlimited scans with fair use).
After beta, plans will be based on number of sites — with an agency tier designed for multi-client workflows.
Drift detection should match your workflow — not arbitrary calendar limits.
You may want to run a baseline, implement changes, and validate quickly.
Unlimited scans lets you iterate without being penalized, while fair use prevents abuse.
After beta, we expect a limited free tier so you can try the tool (e.g., smaller page limit and fewer sites).
Full features (higher page limits, multi-site workflows, agency usage) will be part of paid plans.
Ready to scan your site?
Start with a free beta scan. Unlimited scans (fair use) during beta.
You’ll get the drift report, interactive semantic visualization (UMAP),
linking opportunities, clusters, and a prioritized action plan.