FAQ — Topical Drift Analyzer

On this page

Topical Drift Analyzer

Basics — what is topical drift? Data & access Analysis & methodology Visualization & UMAP Outputs & fixes Technical capabilities Audits

Topical Map Editor

Map basics Map Generator Visual Editor Nodes, edges & data Export & workflow

Plans & pricing

Pricing & limits

Want a done-for-you plan? Request an Audit
Prefer implementation help? Request help

Basics

Topical drift 101

Topical drift is when a page slowly stops representing the topic and intent it used to win for. It often happens after months of edits, new sections, internal linking changes, or "helpful" expansions that pull the page off-center. We measure drift using semantic distance — lower means tighter topical alignment; higher means more drift.

Content decay is typically about freshness, SERP changes, stronger competitors, or better alternatives showing up. Drift is about meaning mismatch: the page content, the internal link context pointing at it, and the queries it actually attracts stop agreeing. Many pages "decay" because they first drift. Our report shows you where meaning is off and what to fix.

It's when an internal link's meaning in context doesn't match the destination page's meaning. Most tools stop at anchor text. We analyze the anchor + surrounding text + the container/heading context to estimate what the link is "claiming," then compare that to the destination page embedding. High mismatch can send confusing topical signals.

Actual distances (default) are cosine distances computed from embeddings. We use these for trend tracking over time because the scale is stable across crawls.

Normalized distances rescale a site's distances into a 0–1 range for easier relative comparison within a site and for composite scoring. Normalized mode is helpful when you want percentile-style zones or you're weighting multiple signals (like links + engagement).

They are designed to work together across the content lifecycle:

Topical Map Editor — plan your architecture before publishing. Build every page, URL, hub, section, FAQ, and internal link in a visual graph.
Topical Drift Analyzer — monitor the architecture after publishing. Crawl your live site to find which pages have drifted away from the plan.

Plan → Publish → Monitor → Fix → Re-plan.

Data & access

What we use, what we store, and what we don't

We use your sitemap URL inventory, cleaned main page content (headings + body text), internal link context (anchor + surrounding text + container/heading context), and (optionally) Google Search Console performance to ground the analysis in real query intent. We generate embeddings from the cleaned text and use UMAP to visualize semantic neighborhoods.

The crawl starts from your sitemap. In most cases that's enough because it reflects your intended indexable inventory. If you have important pages not in the sitemap, add them (recommended) or include them via upload/override in the app. We fetch HTML, extract main content, and generate embeddings for each page.

Read-only access for the property you want to analyze. We use clicks/impressions/position signals to identify pages whose query reality is drifting away from what their content would suggest they should attract. GSC is optional, but it makes drift detection and prioritization far more accurate.

We store analysis results and derived representations (embeddings, scores, aggregates, and report outputs). We avoid storing raw HTML whenever possible. Some features may store small extracted excerpts needed to render the report — not full pages. You can delete analysis data at any time. We do not sell or share your data.

Analysis & methodology

How the drift signal is computed

Embeddings are vectors that represent semantic meaning. We use them to:

Measure page-to-page similarity (cosine similarity) and distance
Group pages into semantic clusters (k-means on embeddings)
Project to 2D for visualization (UMAP)
Compare internal link context vs destination page meaning
Calculate distance from your topical centre (centroid embedding)

UMAP is used for visualization only. Similarity, distance, and SDI are computed in the original embedding space.

Raw HTML includes nav, footers, widgets, "related posts," and template repetition. Embedding all of that makes vectors represent your template, not your topic. Main-content extraction keeps what matters (headings H1–H6, body text, lists, core containers) and reduces boilerplate. That makes clustering cleaner and drift detection more precise.

SDI is a composite score designed for prioritization. The default weighting is:
60% semantic distance (topical alignment)
30% link penalty (internal linking weakness)
10% engagement penalty (GSC performance signals)

You can customize these weights depending on your workflow (pure content audit, link-first fixes, etc.). SDI is most useful in normalized mode, where scores are comparable within a site.

Drift changes after meaningful updates — publishing new pages, editing sections, restructuring internal links — not hour-to-hour. Daily crawls add noise without changing decisions. GSC metrics are delayed and fluctuate day-to-day, so daily drift dashboards often create false alarms. Weekly or monthly crawls create a stable rhythm that's easier to act on. Run a quick before/after crawl during active optimization sprints; monthly is the sweet spot otherwise.

In Actual Distance mode (default):
• Core: ≤ 0.300 — excellent topical alignment
• Focus: 0.300–0.500 — strong alignment, on topic
• Extended: 0.500–0.700 — moderate drift, review recommended
• Peripheral: ≥ 0.700 — significant drift, needs attention

These are default heuristics and will be tuned as we calibrate across different site types.

In Normalized mode: Zones are percentile-based (0–0.3 = best 30%, etc.) for relative comparison within your site.

Visualization & UMAP

Interactive radial map and semantic projection

UMAP (Uniform Manifold Approximation and Projection) reduces high-dimensional embeddings to 2D while preserving semantic neighborhoods. Pages positioned near each other represent similar meaning — which often indicates linking opportunities and shared topical intent. We use those 2D coordinates to visualize relationships and to derive a consistent "around-the-circle" semantic ordering.

Radius (distance from center): Shows drift. Pages farther from your topical center are more off-topic. Toggle between actual distances (stable across crawls) or normalized (0–1 for relative ranking).

Angle (position around the circle): Uses semantic projection by default (derived from UMAP's 2D coordinates), so pages that are similar in meaning sit near each other. You can also switch to group by cluster, distribute evenly, or group by problem type.

Linking opportunities are pairs of pages that are semantically close but don't currently link. The tool surfaces high-confidence candidates using proximity thresholds plus intent and context signals. Adding contextual internal links between close neighbors helps search engines understand your structure and can strengthen topical authority.

Yes. You can:

Toggle between actual/normalized distances
Choose angle mode (semantic, by cluster, uniform, by problem type)
Color by cluster, zone, SDI score, or drift severity
Adjust node size (mapped to traffic)
Show/hide labels (none / all / drifting only)
Filter by cluster
Zoom, pan, and click nodes to open pages
Export as PNG for sharing

No. UMAP only affects the layout of the visualization (where points appear on the 2D map). Drift scoring uses cosine distance computed in the original embedding space. The visual positions are for human readability only.

Outputs & fixes

What you get — and what to do with it

You get:

Drift report with severity rankings
Semantic clusters and hub structure
Interactive radial map with UMAP + distance zones
Internal link meaning mismatch list
Linking opportunities (similar pages that should link)
Zone distribution dashboard
CSV exports for all core datasets
Prioritized checklist and action plan

The fastest wins are usually:

Adjusting link context (anchor + nearby text) to match destination pages
Moving links into better semantic sections
Re-centering headings/content to match intent
Adding internal links between semantically close pages
Consolidating overlapping/cannibalizing pages
Strengthening hub pages with missing entities or sections
Reviewing peripheral pages for consolidation, re-scope, or redirect

With actual distance mode, you can track whether fixes reduce drift:

Run a baseline crawl → note distances for problematic pages
Implement fixes (content updates, link changes)
Run a follow-up crawl (monthly is typical)
Compare distances and zone movement over time

Lower distances = tighter topical focus. Moving from 0.650 to 0.480 indicates measurable improvement in topical alignment.

Yes — I can review your report and turn it into a concrete implementation plan or handle execution directly. That includes prioritization, rewriting link context, identifying consolidation opportunities, and mapping linking strategies based on semantic neighborhoods. Contact me.

Technical capabilities

Under the hood

We use the same embedding model and configuration within a crawl so comparisons are valid. If we change the embedding model or version in the future we'll label that so historical comparisons stay honest.

We cluster page embeddings into topic groups using k-means. The goal is to reveal hub structure: which pages form cohesive neighborhoods, which pages are off-cluster, and which pages behave like bridges. The report uses clusters to summarize your content architecture and prioritize fixes by group.

UMAP is fast and does a strong job preserving semantic neighborhoods at scale (hundreds of pages). t-SNE can be slow and can distort global structure; PCA is fast but often collapses meaningful local similarity. UMAP gives a practical balance for "what should link to what?" visualization.

Yes. You can export CSV for:

Page drift data (URL, distance, zone, cluster, metrics)
Cluster membership and statistics
Internal link meaning mismatches with similarity scores
UMAP coordinates and semantic positioning
Linking opportunities

Available on paid plans only.

Mostly yes — and we can make it reproducible.
Embeddings, similarity, and distance calculations are deterministic for the same input and model. Clustering and UMAP can vary slightly unless we fix random seeds. With fixed seeds, repeated crawls on unchanged content produce the same cluster structure and a stable map layout.

Audits

Done-for-you remediation blueprint

A Topical Drift Audit is a done-for-you remediation blueprint . I review your crawl outputs, validate the highest-impact issues, and deliver a prioritized plan: what to fix first, consolidation decisions, internal linking strategy, and a 30/60/90-day execution plan.

Pricing for the Audit has not been set yet. We are taking on beta Audits on a case-by-case basis. Get in touch and describe your site and what you need.

Choose an Audit if you want a clear execution sequence with fewer "what do we do next?" decisions — prioritization, consolidation strategy, linking architecture, and a sequenced plan done for you. If you prefer to execute in-house and just need visibility and tracking, use the tool directly.

Typically includes:

Executive summary + highest-impact wins
Prioritized roadmap (what to fix first + why)
Consolidation guidance (merge / trim / redirect decisions)
Internal linking strategy (hubs, anchors, contextual placement)
30/60/90-day implementation plan

It helps but is not required. If you already have a crawl include it (or exports) along with your goals. If not, I will tell you exactly what to run so the Audit is grounded in real intent signals.

Pricing for the Audit has not been set yet. We are still working out the right scope and pricing model during beta. Contact us to discuss your site and requirements — we are taking on beta Audits on a case-by-case basis.

Interested in an Audit?

We are taking on beta Audits on a case-by-case basis. Get in touch and describe your site, goals, and timeline — pricing is discussed after we understand the scope.

Request an Audit Start Free Crawl First

Topical Map Editor

Map basics — what is a topical map?

A topical map is the full blueprint of content your site needs to own a subject. It defines which pages to build, how they relate to each other, which ones are hubs, which are sections, and how they should link — before you write a single word. The editor turns that blueprint into a machine-readable graph with URLs, hierarchy, entity data, and a full internal linking plan.

The Macro is the broad category your site or section operates in — for example Dog Breeds. The Seed is the specific topic you want to build topical authority around — for example Golden Retriever. The generator uses both to anchor the map and ensure every node is contextually correct for the seed within the macro category.

Standalone page — has its own canonical URL and lives as an independent page in the sitemap.

Section — has no URL. It is a heading and content block inside a parent page. Shown with a dashed border and muted color in the editor.

Hub — a standalone page that also acts as a navigation centre for its child pages. It contains a summary section for each child and links out to all of them. Shown with a gold ring in the editor.

A node can be both a Section (summarised on its parent) and a Standalone (also has its own page).

Six node types — each with its own color and icon:

Macro — the broadest category (amber)
Seed — the primary topic the map is built around (purple)
Topic — main pillar pages under the seed (cyan)
Subtopic — supporting pages under a topic (green)
FAQ — question-based sections or pages (orange)
Comparison — versus or comparison content (pink)

Every node carries a 0–1 confidence score from the AI pipeline that generated it. Scores below 0.70 suggest the topic may be too tangential or too thin for its own content. Edit or remove low-confidence nodes before briefing content.

Topical Map Generator

AI-powered map generation from macro + seed

Generating a topical map runs a multi-stage AI pipeline including SERP analysis, entity extraction, and topic classification. An account lets us save your maps so you can access them later, load them into the editor, and track your usage against the fair-use limit of 3 maps per month during the beta. The visual editor itself requires no account.

Seven stages:

Normalize the seed — creates IDs, slugs, and establishes macro/seed context
Discover candidate topics — AI topic expansion + SERP validation, PAA extraction, entity extraction
Canonicalize topics — merges duplicates, separates concepts from wording variants
Classify each topic — Section, FAQ, Standalone, or Hub (allows both Section + Standalone)
Assign hierarchy + generate URLs — parent-child structure, tier levels, canonical URLs
Generate internal linking — parent-child edges, related edges, FAQ attachment edges, anchor text + placement hints
Assemble the final graph — nodes + edges JSON, loads into the editor automatically

Most maps complete in under 60 seconds. Larger seeds with many subtopics or seeds that require deeper SERP analysis may take up to 2–3 minutes. A progress indicator is shown during generation. The finished map opens automatically in the editor when the pipeline completes.

Yes. The editor accepts any JSON in the node/edge format — whether generated by our pipeline, ChatGPT, Claude, or your own system. Format the output as the node/edge JSON, load it into the editor, and use the visual interface to merge duplicates, classify pages, assign URLs, and mark hubs. Both PascalCase and camelCase JSON are supported on import.

During the beta, up to 3 topical maps per month on the free account (fair use). Higher limits will be available on paid plans. The visual editor — loading JSON, editing, and exporting — has no generation limits and requires no account.

Visual Map Editor

Editing, layouts, and navigation

The visual editor is completely free to use with no account required. Load a JSON file, the built-in starter map, or start from scratch — all in the browser. Editing and exporting (PNG, SVG, CSV, JSON) require no server round-trips. Only the AI-powered Generator requires a free account.

Force Graph — physics simulation. Best for exploring the overall structure and spotting clusters. Drag nodes freely.

Horizontal Tree — left-to-right hierarchy. Tier level maps to column position. Best for reviewing URL depth and parent-child relationships.

Radial Tree — concentric rings. Seed is at the centre, Topics on the first ring, Subtopics on the second. Best for client presentations and exports.

Click any node to select it. The Node Properties panel opens on the right. Edit the label, slug, node type, URL role, hub role, tier, parent, canonical URL, confidence score, entities, and aliases. Click Apply Changes to save. The graph updates immediately.

Yes. Use View → Hide Section-Only Nodes to remove all nodes with URL Role = Section Only from the graph. The data is preserved — toggle off to restore them. The stats bar shows visible/total when filtering is active. This is useful for reviewing only the pages that will have their own URLs.

/ — focus the search input
F — fit view to all nodes
Escape — close the edit panel or clear search
Delete / Backspace — delete the selected node (when not typing)

The editor runs entirely in the browser using D3.js and handles maps up to several hundred nodes comfortably. For very large maps (500+ nodes) switch to the Horizontal Tree or Radial Tree layout for better performance. Use Hide Section-Only Nodes to reduce visual clutter when reviewing large maps.

Nodes, edges & data

What lives inside a topical map

Every node contains:

Unique ID, label, and URL slug
Node type (Macro / Seed / Topic / Subtopic / FAQ / Comparison)
Tier level (0 = Macro, 1 = Seed, 2 = Topic, 3+ = Subtopic)
URL Role (Standalone = has its own page; Section = no URL)
Hub Role (Hub Page or not)
Canonical URL (auto-generated from parent URL + slug)
Parent URL and parent node ID
Confidence score (0–1)
Entities (named organisations, concepts, tools, people)
Aliases (keyword variants and search intent phrases)

Every edge contains:

Source node ID and target node ID
Edge type (Parent-Child / FAQ Attachment / Related)
Weight (0–1, how strong the relationship is)
Recommended anchor text
Alternate anchor phrases
Placement hint (body / hub section / FAQ block)
Required vs optional status

The editor accepts any JSON with a Nodes/nodes array and an Edges/edges array. Both PascalCase (pipeline output) and camelCase (editor export) are supported on import. The format is graph-ready and compatible with Neo4j, D3.js, and any graph database or visualisation tool.

When the Canonical URL field is empty, typing a label automatically generates the URL from the parent's canonical URL and the slug:

Parent: /golden-retriever/health/
Slug: hip-dysplasia
Result: /golden-retriever/health/hip-dysplasia/

Changing URL Role to Section Only clears the URL field. Changing the parent regenerates the URL if the field is empty. Manually typing a URL disables auto-generation for that node.

Export & workflow

Getting your map out of the editor

PNG (Viewport) — exports what is currently visible at 4× screen resolution
PNG (Full Map) — captures the entire graph at 4800px+ on the longer side
SVG (Viewport / Full Map) — vector format with inlined styles, scalable to any size
CSV — all node fields for content calendar spreadsheets (built in browser, no server)
JSON (Copy / Save) — full graph for pipeline reload, CMS, Neo4j, or D3.js

Generate the map (or load your JSON)
Review and edit in the editor — fix labels, URLs, hub assignments
Switch to Radial Tree layout and click Fit View
Enable Rectangular Nodes + Full Labels
Use Hide Section-Only Nodes to show only pages
Export PNG — Full Map for the client presentation
Export CSV for the content team brief
Export JSON to save and reload later

Yes. Use File → Load JSON to load any JSON file from your computer. The editor accepts the same format it exports, so you can save and reload maps indefinitely. If you generated the map via the Generator, it is also saved to your account and accessible from the editor home page.

Use the editor to plan — before you publish. Use the Drift Analyzer to monitor — after you publish. The full lifecycle:

Plan (Map Editor) → Publish (your CMS) → Monitor (Drift Analyzer) → Fix (Drift report + action plan) → Re-plan (Map Editor)

Pricing & limits

Plans, limits, and beta pricing

Yes. All three tools are free during beta:

Drift Analyzer — unlimited crawls (fair use), up to 500 pages per site, free account required
Topical Map Generator — up to 3 maps per month, free account required
Visual Map Editor — always free, no account required, no limits

Pricing for the Drift Analyzer and Generator has not been set yet. We will email all users before anything changes — with advance notice and a grace period. Join the pricing waitlist to be notified first and receive early-user pricing.

We do not have a date yet. We are still learning what these tools are worth to different types of users — freelancers, agencies, in-house teams, and publishers all have different needs and budgets.

When pricing is ready we will email all registered users before any public announcement. Waitlist members receive early-user pricing. Join the waitlist .

You can run crawls whenever you need them — no artificial monthly limits. "Fair use" means reasonable usage for legitimate site analysis, not automated mass-recrawling. Most teams run 1–2 crawls per site per month, but you can crawl more often when iterating on fixes. Drift detection should match your workflow, not an arbitrary calendar limit.

During beta the limit is 500 pages per site. Higher page limits will be available on paid plans. If you need to crawl a larger site now, you can crawl it in phases — blog section first, then product or service pages — or contact us to discuss your requirements.

During beta the free account includes 3 maps per month. Higher generation limits will be available on paid plans after beta. The Visual Map Editor — loading, editing, and exporting JSON — has no generation limits and requires no account, so you can continue working on existing maps freely.

Yes. During beta you can crawl multiple sites (up to 500 pages each, unlimited crawls with fair use). Multi-site plans for agencies will be part of the paid tier structure announced after beta.

During beta the following Drift Analyzer features are locked and will be unlocked on paid plans after beta pricing is announced:

Affected URL lists for every issue type
Per-page diagnostics (exact URLs, CTR gaps, missed clicks)
Full issue drilldowns and complete export tables
Advanced recommendations and remediation details

The free crawl shows your radial map, clusters, issue severity, and affected page counts — enough to understand what is wrong before committing to a paid plan.

Drift detection should match your workflow — not arbitrary calendar limits. You may want to run a baseline, implement changes, and validate quickly. Unlimited crawls let you iterate without being penalized, while fair use prevents abuse.

It depends on where you are in your content lifecycle:

Starting a new site or topic cluster? Use the Map Generator to plan your architecture before publishing. Then use the Visual Editor to review, refine, and export.
Have an existing live site? Use the Drift Analyzer first to find where your content has already drifted. Then use the Map Editor to re-plan the structure.
Not sure what a topical map looks like? Load the Golden Retriever starter map in the editor — no account needed — to see a complete example before generating your own.

Yes. A Topical Drift Audit is a done-for-you remediation blueprint: what to fix first, consolidation decisions, internal linking strategy, and a 30/60/90-day plan. Pricing has not been set yet — we are taking on beta Audits on a case-by-case basis. Request an Audit to discuss your site and requirements.

Ready to get started?

Everything is free during beta. Run a free drift crawl, generate a topical map, or explore the starter map — no credit card required. Join the waitlist to be notified when pricing is announced.

Start Free Drift Crawl Generate a Topical Map Explore Starter Map

Topical drift 101

What is "topical drift" in plain English?

How is topical drift different from "content decay"?

What is "internal link meaning mismatch"?

What are "actual distances" vs "normalized distances"?

How does the Topical Drift Analyzer relate to the Topical Map Editor?

What we use, what we store, and what we don't

What data sources do you use?

Do you need full crawl access beyond the sitemap?

What Google Search Console access do you need?

Do you store my page content or query data?

How the drift signal is computed

How do embeddings fit into this?

Why do you "clean main HTML" before creating page embeddings?

What is the SDI (Structural Drift Index)?

Why weekly/monthly crawls instead of daily?

What are the zone thresholds — Core, Focus, Extended, Peripheral?

Interactive radial map and semantic projection

What is UMAP and why do you use it?

How does the radial map work?

What are "linking opportunities"?

Can I customize the visualization?

Does UMAP affect my drift scores?

What you get — and what to do with it

What do I get after a drift crawl?

What are the most common fixes?

How do I track improvements over time?

Can you implement the fixes for us?

Under the hood

What embedding model do you use?

What clustering algorithm do you use?

Why UMAP instead of t-SNE or PCA?

Can I export the raw data?

Is the analysis deterministic / reproducible?

Done-for-you remediation blueprint

What is a Topical Drift Audit?

When should I get an Audit instead of just using the tool?

What do you deliver in the Audit?

Do I need to run a crawl before requesting an Audit?

How much does the Audit cost?

Map basics — what is a topical map?

What is a topical map?

What is the difference between a Macro and a Seed?

What is the difference between a Hub, a Standalone page, and a Section?

What node types does the editor support?

What is a confidence score?

AI-powered map generation from macro + seed

Why do I need an account to generate a map?

What does the generator pipeline actually do?

How long does generation take?

Can I use the generator with my own AI-generated topic lists?

How many maps can I generate per month?

Editing, layouts, and navigation

Is the editor free? Do I need an account?

What are the three layout options?

How do I edit a node?

Can I hide section-only nodes?

What keyboard shortcuts are available?

How many nodes can the editor handle?

What lives inside a topical map

What information does each node contain?

What information does each edge contain?

What JSON format does the editor accept?

How does the URL auto-generator work?

Getting your map out of the editor

What export formats are available?

What is the recommended workflow for clients?

Can I load a saved map back into the editor?

How does the Topical Map Editor connect to the Drift Analyzer workflow?

Plans, limits, and beta pricing

Is everything really free right now?

When will pricing be announced?

What does "unlimited crawls (fair use)" mean?

Can I crawl more than 500 pages?

Can I generate more than 3 topical maps per month?

Can I crawl multiple sites?

What is locked on the free plan right now?

Why unlimited crawls instead of monthly limits?

Which tool should I start with?

Do you offer done-for-you Audits?