Processing...
Processing...

Topical Drift Analyzer FAQ: plans, scans, maps, and audits

Topical Drift Analyzer finds pages that “don’t belong” — where internal link meaning, on-page content, and Google Search Console queries don’t line up. It clusters pages into semantic neighborhoods, visualizes them with UMAP, and turns the results into a prioritized action plan.

This FAQ explains what you get on each plan (including page limits and locked features), how scans work, and when a Topical Drift Audit makes sense.

What’s included (varies by plan)
  • Free plan available
  • Scan limits and page caps vary by plan
  • Interactive clustering + UMAP visualization
  • Action plan + exports (some options may be locked)
  • Advanced diagnostics & page-level details on paid plans
  • Optional: Topical Drift Audit (done-for-you blueprint)

Topical drift 101

Topical drift is when a page slowly stops representing the topic and intent it used to win for. It often happens after months of edits, new sections, internal linking changes, or “helpful” expansions that pull the page off-center. We measure drift using semantic distance — lower means tighter topical alignment; higher means more drift.

Content decay is typically about freshness, SERP changes, stronger competitors, or better alternatives showing up. Drift is about meaning mismatch: the page content, the internal link context pointing at it, and the queries it actually attracts stop agreeing. Many pages “decay” because they first drift. Our report shows you where meaning is off and what to fix.

It’s when an internal link’s meaning in context doesn’t match the destination page’s meaning. Most tools stop at anchor text. We analyze the anchor + surrounding text + the container/heading context to estimate what the link is “claiming,” then compare that to the destination page embedding. High mismatch can send confusing topical signals.

Actual distances (default) are cosine distances computed from embeddings (cosine distance ranges from 0–2 in theory; in most same-site content comparisons it often falls closer to ~0–1). We use these for trend tracking over time because the scale is stable across scans.

Normalized distances rescale a site’s distances into a 0–1 range for easier relative comparison within a site (and for composite scoring). Normalized mode is helpful when you want percentile-style zones or you’re weighting multiple signals (like links + engagement).

What we use, what we store, and what we don’t

We use your sitemap URL inventory, cleaned main page content (headings + body text), internal link context (anchor + surrounding text + container/heading context), and (optionally) Google Search Console performance to ground the analysis in real query intent. We generate embeddings from the cleaned text and use UMAP to visualize semantic neighborhoods.

The scan starts from your sitemap. In most cases that’s enough because it reflects your intended indexable inventory. If you have important pages not in the sitemap, add them (recommended) or include them via an upload/override in the app. We fetch HTML, extract main content, and generate embeddings for each page.

Read-only access for the property you want to analyze. We use clicks/impressions/position signals to identify pages whose query reality is drifting away from what their content would suggest they should attract. GSC is optional, but it makes drift detection and prioritization far more accurate.

We store analysis results and derived representations (e.g., embeddings, scores, aggregates, and report outputs). We avoid storing raw HTML whenever possible. Some features (like “placement snippets”) may store small extracted excerpts needed to render the report (not full pages). You can delete analysis data at any time. We do not sell or share your data.

How the drift signal is computed

Embeddings are vectors that represent semantic meaning. We use OpenAI’s text-embedding-3-small to create embeddings and then:
  • Measure page-to-page similarity (cosine similarity) and distance (cosine distance)
  • Group pages into semantic clusters (e.g., k-means on embeddings)
  • Project embeddings to 2D for visualization (UMAP)
  • Compare internal link context vs destination page meaning
  • Calculate distance from your topical “center” (a centroid/center-of-mass embedding)
  • UMAP is used for visualization; similarity, distance, and SDI are computed in the original embedding space.
You don’t need the math — the report turns it into actions.

Raw HTML includes nav, footers, widgets, “related posts,” and template repetition. Embedding all of that makes vectors represent your template, not your topic. Main-content extraction keeps what matters (headings H1–H6, body text, lists, core containers) and reduces boilerplate. That makes clustering cleaner and drift detection more precise.

SDI is a composite score designed for prioritization. A common default is:
60% semantic distance (topical alignment)
30% link penalty (internal linking weakness / authority proxy)
10% engagement penalty (GSC performance signals)

You can customize these weights in the visualization depending on your workflow (pure content audit, link-first fixes, etc.). SDI is most useful in normalized mode, where scores are comparable within a site.

Drift usually changes after meaningful updates — publishing new pages, editing sections, restructuring internal links, or consolidating content — not hour-to-hour. Daily scans tend to add noise without changing decisions.

Also, Google Search Console metrics are delayed and can fluctuate day-to-day, so “daily drift dashboards” often create false alarms. Weekly or monthly scans create a stable rhythm that’s easier to act on.

With actual distance mode, you can track whether your fixes reduce drift over time: lower distances = tighter topical focus. If you’re in an active optimization sprint, run a quick “before / after” scan — otherwise monthly is the sweet spot.

In Actual Distance mode (default):
Core: ≤ 0.300 (excellent topical alignment)
Focus: 0.300–0.500 (strong alignment, generally “on topic”)
Extended: 0.500–0.700 (moderate drift — review recommended)
Peripheral: ≥ 0.700 (significant drift — needs attention)

These are default heuristics and can be tuned as we calibrate across different site types and content mixes.

In Normalized mode: Zones are percentile-based (0–0.3 = best 30%, etc.) for relative comparison within your site.

Interactive radial map and semantic projection

UMAP (Uniform Manifold Approximation and Projection) reduces high-dimensional embeddings to 2D while preserving semantic neighborhoods. We use those 2D coordinates to visualize relationships and to derive a consistent “around-the-circle” semantic ordering (and we can stabilize it with fixed seeds). Pages positioned near each other represent similar meaning — which often indicates linking opportunities and shared topical intent.

Radius (distance from center): Shows drift. Pages farther from your topical center are more off-topic (higher distance). You can toggle between actual distances (stable across scans) or normalized (0–1 for relative ranking).

Angle (position around the circle): Uses semantic projection by default (derived from UMAP’s 2D coordinates), so pages that are similar in meaning sit near each other around the circle. You can also switch the angle mode to group by cluster, distribute evenly, or group by problem type.

Linking opportunities are pairs of pages that are semantically close but don’t currently link. The tool surfaces high-confidence candidates using proximity thresholds (tunable) plus intent and context signals. Adding contextual internal links between close neighbors helps search engines understand your structure and can strengthen topical authority.

Yes. You can:
• Toggle between actual/normalized distances
• Choose angle mode (semantic projection, by cluster, uniform, by problem type)
• Color by cluster, zone, SDI score, or severity
• Adjust node size based on traffic
• Show/hide labels (none/all/drifting only)
• Filter by cluster
• Zoom, pan, and click nodes to open pages
• Export as PNG for sharing

Node size: By default, mapped to traffic (GSC clicks). Larger nodes = more traffic.
Node opacity: By default, mapped to internal link strength (e.g., inlinks / authority proxy). Brighter nodes are better connected; faint nodes are weakly linked (often linking opportunities or near-orphans). Both mappings are configurable in the UI.

No — UMAP only affects the layout of the visualization (where points appear on the 2D map). Drift scoring uses cosine distance computed in the original embedding space.

What you get — and what to do with it

You get:
• Drift report with severity rankings
• Semantic clusters and hub structure
• Interactive radial map with semantic projection (UMAP) + distance zones
• Internal link meaning mismatch list
• Linking opportunities (similar pages that should link)
• Zone distribution dashboard
• CSV exports for the main datasets
• Prioritized checklist / action plan

The fastest wins are usually:
• Adjusting link context (anchor + nearby text) to match destination pages
• Moving links into better semantic sections (where the surrounding paragraph supports the target)
• Re-centering headings/content to match intent
• Adding internal links between semantically close pages (from linking opportunities)
• Consolidating overlapping/cannibalizing pages (merge, redirect, or re-scope)
• Strengthening hub pages (missing entities/sections aligned to the winning cluster)
• Reviewing peripheral pages (high distance) for consolidation, re-scope, noindex, or redirect — depending on intent and value

With actual distance mode, you can track whether your fixes reduce drift:
• Run a baseline scan → note distances for problematic pages
• Implement fixes (content updates, link changes)
• Run a follow-up scan (monthly is typical)
• Compare distances and zone movement over time

Lower distances = tighter topical focus. For example, moving from 0.650 to 0.480 indicates measurable improvement in topical alignment. The zone distribution dashboard shows how many pages are in each zone over time.

Yes — if you want, I can review your report and turn it into a concrete implementation plan (or handle execution). That includes prioritization, rewriting link context where needed, identifying consolidation opportunities, and mapping linking strategies based on semantic neighborhoods. Contact me.

Done-for-you remediation blueprint

A Topical Drift Audit is a done-for-you remediation blueprint. I review your scan outputs, validate the highest-impact issues, and deliver a prioritized plan: what to fix first, consolidation decisions, internal linking strategy, and a 30/60/90-day execution plan.

Choose an Audit if you want a clear execution sequence (and fewer “what do we do next?” decisions). If you prefer to execute in-house and just need visibility + tracking, use the tool and upgrade to unlock affected URLs and per-page diagnostics.

Typically includes:
• Executive summary + highest-impact wins
• Prioritized roadmap (what to fix first + why)
• Consolidation guidance (merge/trim/redirect decisions)
• Internal linking strategy (hubs, anchors, contextual placement)
• 30/60/90-day implementation plan

It helps, but it’s not required. If you already have a scan, include it (or exports) and your goals. If not, I’ll tell you exactly what to run (sitemap + optional GSC) so the Audit is grounded in real intent signals.
Want a done-for-you plan?
Request a Topical Drift Audit and get a prioritized remediation blueprint + 30/60/90-day execution plan.

Under the hood

We use the same embedding model and configuration within a scan so comparisons are valid. If we change the embedding model/version in the future, we’ll label that so historical comparisons stay honest.

We cluster page embeddings into topic groups (commonly using k-means). The goal is to reveal hub structure: which pages form cohesive neighborhoods, which pages are off-cluster, and which pages behave like bridges. The report uses clusters to summarize your content architecture and prioritize fixes by group.

We use UMAP because it’s fast and does a strong job preserving semantic neighborhoods at scale (hundreds of pages). t-SNE can be slow and can distort global structure; PCA is fast but often collapses meaningful local similarity. UMAP gives a practical balance for “what should link to what?” visualization.

Yes. You can export CSV for core datasets like:
• Page drift data (URL, distance, zone, cluster, metrics)
• Cluster membership and statistics
• Internal link meaning mismatches with similarity scores
• UMAP coordinates / semantic positioning
• Linking opportunities

Available in paid plans only.

Mostly yes (and we can make it reproducible):
• Embeddings are deterministic for the same input/model
• Similarity and distance calculations are deterministic
• Clustering and UMAP can vary slightly unless we fix random seeds

With fixed seeds, repeated scans on unchanged content produce the same cluster structure and a stable map layout. Without seeds, results remain directionally consistent, but you may see small layout or assignment changes.

Plans and limits

Scans are free with unlimited scans (fair use) and up to 500 pages per site. You get clustering, the interactive radial map with semantic projection (UMAP), issues, and opportunities. No credit card required.
Some advanced details are locked on the Free plan.
Locked
  • Full issue drilldowns (all affected pages, clusters, and full tables)
  • Page-level diagnostics (exact URLs, CTR gaps, missed clicks, and prioritization)
  • Expanded exports (complete CSVs and bulk lists for remediation)
  • Advanced recommendations and “done-for-you” remediation blueprints

You can run scans whenever you need them — no artificial monthly limits. “Fair use” means reasonable usage for legitimate site analysis (not abuse or automated mass-rescanning). Many teams run 1–2 scans per site per month, but you can scan more often while iterating on fixes.

Yes. You can add extra pages as add-on blocks or scan large sites in phases (e.g., blog section first, then product/service pages). For agencies managing larger portfolios, we offer higher page limits and multi-site plans.

Yes. A Topical Drift Audit is a done-for-you remediation blueprint: what to fix first, consolidation decisions, internal linking strategy, and a 30/60/90-day plan. You can request one here: Request an Audit.

Yes. You can scan multiple sites (up to 500 pages each, unlimited scans with fair use). Plans are based on number of sites — with an agency tier designed for multi-client workflows.

Drift detection should match your workflow — not arbitrary calendar limits. You may want to run a baseline, implement changes, and validate quickly. Unlimited scans lets you iterate without being penalized, while fair use prevents abuse.

We have a limited free tier so you can try the tool (e.g., smaller page limit and fewer sites). Full features (higher page limits, multi-site workflows, agency usage) will be part of paid plans.
Ready to scan your site?
Start with a free scan. Unlimited scans (fair use). You’ll get the drift report, interactive semantic visualization (UMAP), linking opportunities, clusters, and a prioritized action plan.