Jonas hired an agency to scale his e-commerce store. The pitch was simple: thousands of backlinks, rapid growth. Within six weeks his referral graph looked impressive on paper. As it turned out, organic traffic started to fall. Cost per acquisition climbed. The agency shrugged and sent a spreadsheet full of "high volume" links, plus a single line about spam score: anything above 30 gets disqualified.
This led to panic. Jonas paid for links that seemed to harm rather than help. Meanwhile, he faced the conducting backlink audits hard choice every site owner fears - rip out dozens of links, disavow, and hope Google rewards the clean-up. He brought me in after losing 40% of branded organic visits in two months. I'd cleaned up after dozens of similar botched campaigns, so I knew the first rule: metrics alone do not tell the full story.
The Hidden Risk of Trusting a Spam Score Threshold as a Safety Net
Most SEOs will tell you to avoid domains with a spam score above 30. The logic sounds reasonable: higher spam score equals higher risk. But problems arise when that rule becomes an automated filter that trashes links without context. Spam score is a heuristic. It captures signals like unnatural redirects, thin content, and link-farm behavior. It does not capture intent, topical fit, or the nuanced value a single link might hold for a given page.
As it turned out, Jonas had three important links flagged above 30 that were driving referral traffic and brand searches. Removing them because a tool set an arbitrary cutoff would have worsened the situation. This led to a new approach: stop doing blanket deletions and start quantifying how each link actually behaves in relation to rankings and conversions.
Why Removing Every Link Above 30 Often Does More Harm Than Good
There are a few reasons a single-number cutoff fails in practice:
- False positives: low-quality metrics can be assigned to legitimate niche sites or local directories that still funnel real customers. False negatives: a domain with a low spam score may still host manipulative links using clean-looking pages. Context matters: anchor text distribution, placement on a page, surrounding content, and traffic patterns change the equation. Network effects: removing many links at once can trigger ranking volatility that looks worse than the original problem.
Consider the thought experiment of two links. Both come from domains with a spam score of 32. One is buried in a sitewide footer across thousands of pages. The other sits inside a relevant editorial article, receives organic traffic, and produces referral visits. The automatic cutoff treats them the same. A human evaluating signals would not.

How One SEO Analyst Built a Data-First Test to Measure Backlink Value
I proposed a controlled cleanup for Jonas. The idea was simple: measure before you remove. We set up a three-step process built on data and small-scale experiments - not gut calls or blanket filters.
Step 1 - Baseline and classification
We pulled every referring domain from Google Search Console, Ahrefs, and Majestic. Then we tagged each link with:
- Spam score (tool-specific) Referring domain authority metrics (DR, TF) Estimated monthly referral visits Anchor text risk (commercial, exact match, brand) Placement type (editorial, footer, sidebar, comments) Date first seen
This produced a multidimensional view rather than a single fail/ok flag. As it turned out, some flagged domains had steady direct referral traffic - a warning sign against instant removal.
Step 2 - Cohort analysis and control groups
We then grouped pages by topical intent and traffic velocity. The key thought experiment was: if I remove certain links, which pages are most likely to show a causal ranking change? We created three cohorts:
High-risk links: spam score > 30, toxic anchor text, sitewide placements. Moderate-risk links: spam score 20-40, mixed placement, low referral traffic. Control set: low-risk links and internal backlinks, left untouched.We kept the control set to isolate the impact of link removals. This allowed a difference-in-differences view of ranking and traffic behavior - a light version of an A/B test for SEO.
Step 3 - Small removals, measurement windows
Instead of disavowing hundreds of domains at once, we started with batches of 10-20 domains from the high-risk cohort. For each batch we:
- Requested manual removals where possible Filed targeted disavow entries for stubborn cases Tracked rankings, impressions, clicks, and conversions for 6-8 weeks
If removal caused further decline, we paused. If it stabilized and recovered, we continued. This method avoids the common agency error: doing everything at once and hoping Google will "reward" the cleanup.
Advanced Techniques to Prove a Backlink Helps or Harms Rankings
Beyond the basic test above, I used advanced analytics to isolate signal from noise. These techniques require more setup but cut budget waste and reduce risk.
Time-series causality with rolling windows
Use a rolling-window correlation between link acquisition timestamps and ranking changes for target keywords. The aim is to spot temporal associations that standard snapshots miss. If a sudden influx of low-quality links coincides with rank drops across several pages, that strengthens the hypothesis that links contributed to the decline.
Anchor text risk heatmaps
Create a heatmap that cross-tabulates anchor types with page performance. This surfaces patterns like “pages that received a lot of exact-match commercial anchors in month X dropped in rankings.” Anchor risk is often the strongest predictor of manual action or algorithmic devaluation.
Referral traffic as an independent validator
A link that sends real users is more likely to be editorial and relevant. Use server logs, Google Analytics, and UTM-tagged campaigns to quantify referral quality. If a flagged domain drives converting traffic, consider retaining it even if the spam score is above your typical cutoff.
Synthetic control page experiment
Create two near-identical pages. Get a set of links purchased or earned to one page and not the other. Monitor both over several months. This thought experiment provides a near-causal look at how link acquisition affects rankings for specific content types.

From Disaster to Recovery: How We Turned Jonas' Campaign Around
We removed only 42 domains in four batches. The first batch had the highest risk signs. As it turned out, rankings continued to fall for two weeks, then stabilized. The second batch produced no further decline; impressions began to plateau. The third and fourth batches coincided with steady recovery.
Over three months Jonas regained 85% of his lost branded searches and reduced non-branded organic drop to single-digit percentages. Conversions returned to previous levels. This led to a clearer budget plan: stop buying high-volume low-quality links and invest in a focused mix of PR, content partnerships, and targeted outreach that earned natural placements.
Observations that protect budgets
- Tools are guides, not judges. Spam score should be a signal, not an absolute. Measure before you act. Small experiments prevent expensive mistakes. Track conversions, not just rankings. Some links improve business metrics even if their SEO signal is murky.
Practical Framework: How to Decide What to Disavow
Here is a compact decision matrix you can apply. Score each link across five dimensions and prioritize based on total risk.
Dimension Low Medium High Spam heuristic 0-10 11-30 31+ Referring traffic Steady visitors Occasional clicks None Anchor risk Brand/neutral Partial match Exact-match commercial Placement Contextual editorial Author bio/sidebar Footer/comments/sitewide Age & velocity Old, stable Recent spike Rapid new acquisitionHigh total risk items go into your removal queue. Moderate risk items become candidates for monitoring or outreach. Low risk items stay, even if spam score sits above 30 in isolation.
Thought Experiments to Sharpen Your Decisions
Run these in meetings to align teams and agencies on real priorities.
The "10 Links" Test
Imagine you could only keep 10 external links to a high-value product page. Which ones would you choose and why? If you default to metrics only, you likely miss links that drive real buyers. Force teams to defend choices with conversion data or contextual relevance.
The "No Tools" Scenario
Imagine you cannot see spam scores or authority metrics for 48 hours. Which links would you remove first based on purely human signals - odd domain names, thin content, irrelevant language? This reduces blind dependence on a single metric and trains judgment.
Final Checklist Before Hitting Disavow
- Create a baseline snapshot of rankings, impressions, conversions. Classify and tag every link across multiple signals. Run small removal batches with a control group. Track results for a minimum of 6 weeks per batch. Keep manual outreach logs for removals you requested. Only disavow what you cannot remove and that passes your risk threshold after testing.
Why Agencies Promise Clean Cuts and What to Watch For
Many agencies sell the idea of "clean everything above 30" because it is simple to pitch and quick to execute. That promise sounds good in a report, but reality is messy. An aggressive, unmeasured disavow can remove useful links, unsettle Google's understanding of your site, and waste months of recovery time. Be skeptical when a provider wants to take a sledgehammer to your link graph without first proving a causal link to declining performance.
Final advice
Use spam score thresholds as a guardrail, not a commandment. Build experiments, use control groups, and let conversion data guide the most expensive decisions. Jonass' recovery did not come from blind cleaning alone; it came from targeted actions, patient measurement, and rejecting the false comfort of a single-metric rule. Protect your budget by treating each link as an investment to be tested, not a line item to be removed because a dashboard says so.