Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SMX Munich - Ditch Keyword-Specific Pages – Bui...

SMX Munich - Ditch Keyword-Specific Pages – Build Topic Focus through Content Consolidation

My 2025 talk at SMX Munich focussed on a meta analysis of 8,500 sites about whether or not content consolidation in a helpful content update world was a good way to build traffic to the website. TL;DR, it is. This deck outlines the data and you can see a breakdown of my approach to the data here:

Avatar for Amanda King

Amanda King

May 18, 2025
Tweet

More Decks by Amanda King

Other Decks in Marketing & SEO

Transcript

  1. What’s what 1. Keywords vs. topics for Google 2. The

    real world 3. The analysis 4. How to implement this yourself 5. Who’s this human speaking to us?
  2. Google acknowledges query-only based matching is pretty terrible. “Direct “Boolean”

    matching of query terms has well known limitations, and in particular does not identify documents that do not have the query terms, but have related words [...]The problem here is that conventional systems index documents based on individual terms, rather than on concepts. Concepts are often expressed in phrases [...] Accordingly, there is a need for an information retrieval system and methodology that can comprehensively identify phrases in a large scale corpus, index documents according to phrases, search and rank documents in accordance with their phrases, and provide additional clustering and descriptive information about the documents. [...]” - Information retrieval system for archiving multiple document versions, granted 2017 (link)
  3. Queries very quickly become entities “[...]identifying queries in query data;

    determining, in each of the queries, (i) an entity-descriptive portion that refers to an entity and (ii) a suffix; determining a count of a number of times the one or more queries were submitted“ - patent granted in 2015, submitted in 2012 Source: https://patents.google.com/patent/US9047278B1/en ; https://patents.google.com/patent/US20150161127A1/ , https://patents.google.com/patent/US8032507B1/en
  4. • Semantic distance • Keyword-seed affinity • Category-seed affinity •

    Category-seed affinity to threshold Parsing is intrinsically categorisation https://patents.google.com/patent/US11106712B2; https://www.seobythesea.com/2021/09/semantic-relevance-of-keywords/
  5. It actually looks like they had a classification engine for

    entities as well This patent was filed in 2010, granted in 2014. Likely a basis for the Knowledge Graph. (US8838587B1) https://patents.google.com/patent/US8838587B1/en
  6. “Rather than simply searching for content that matches individual words

    , BERT comprehends how a combination of words expresses a complex idea.” Source: https://blog.google/products/search/how-ai-powers-great-search-results/ 2022
  7. What is, then, is “information gain”? Phrase-based searching in an

    information retrieval system, granted 2009 (link) ; “Contextual estimation of link information gain” granted to Google in Jul 2024 (link) [Australian Shepard] URL 1 URL 2 Aussie Aussie Red merle Blue merle Tricolor
  8. But then there’s this whole concept of consensus score Mark

    Williams-Cook tested this with the Google exploit he analysed and got a bounty for. Source: https://www.youtube.com/watch?v=_AQ9UDqES80
  9. So for business owners online there will always be a

    tension between creating unique content for customers and answering the same FAQ.
  10. One way to address this is to be as direct

    as possible, cut out historical ‘SEO content’ and other content bloat to focus on matching user intent and topical relationships.
  11. Unfocused law firm • Over 700 thin blog posts •

    Poor search intent targeting: Rankings for queries unrelated to core business offerings • Previous agency focused on audits without taking action From firewire digital: https://www.firewiredigital.com.au/case-studies/burke-mead-lawyers/
  12. Challenger bank in Europe wins via consolidation and authority •

    817 pages (38%) removed to to irrelevance • 376 pages (17%) merged due to overlap • 202 pages (9%) to improve Direct communication with Precis Digital, Linus Alenius
  13. Informational Financial Sites • Site #1 pruned 15% of old

    content Oct 2023 • Site #2 migrated (with content consolidation) May 2023, with ongoing consolidation in 2024 Site #1 Site #2
  14. The Analysis I wanted to see how this would all

    scale A big thank you to the SEMRush team, particularly the engineer pulling all this data for me, Yulia!
  15. Across 8,421 domains I reviewed data to see if reducing

    pages was a stable, sustainable choice for growth 2022 August First helpful content update December Helpful content update Link spam update 2023 March Core update August Core update September Helpful content update October Core update November Core update Reviews update 2024 March Core update Spam update June Spam update November Core update December Spam update Core update August Core update Feb 2023 My data starts here
  16. I aimed to play in the “middle of the road”

    websites, not super massive ones Classification Average Start Pages Average End Pages Average Absolute Page Change Average Relative Page Change Average Traffic Change MPMT 16,130 26,910 10,780 81.40% 74.11% MPST 11,690 16,504 4,814 67.65% -6.15% FPLT 29,487 19,659 -9,828 -34.16% -55.92% SPMT 18,860 20,215 1,355 12.26% 50.78% SPLT 10,799 10,902 103 11.11% -51.01% MPLT 10,090 14,397 4,307 76.80% -46.09% SPST 15,222 15,532 310 10.45% -8.91% FPST 32,028 20,756 -11,272 -30.72% -10.85% FPMT 30,353 22,168 -8,185 -30.96% 54.71%
  17. This was an interesting way to start the analysis Fewer

    websites to work with isn’t necessarily a bad thing though Classification Count Percent Shutdown 1,742 20.69% More pages more traffic 1,445 17.16% More pages same traffic 922 10.95% Fewer pages less traffic 747 8.87% Same pages more traffic 724 8.60% Same pages less traffic 667 7.92% More pages less traffic 662 7.86% Same pages same traffic 653 7.75% Fewer pages same traffic 468 5.56% Fewer pages more traffic 391 4.64%
  18. Across the board, reducing your pages gave you slightly lower

    volatility in traffic changes than if you added more Classification Domains Average Traffic Increase Average Page Change Average Traffic Volatility Traffic to Page Ratio MPMT 1,445 74.11% 81.40% 25.29 5.86 FPMT 391 54.71% -30.96% 22.86 5.93 SPMT 724 50.78% 12.26% 21.84 15.67
  19. When we remove media, this becomes even clearer: FPMT volatility

    is in the 22-26 range, compared to MPMT (30+)
  20. Yet reviewing media and non-media sites, media may actually benefit

    more from consolidation • On average, media domains received 42% more incremental traffic than non-media domains if they reduced their website pages • 69% of domains that added pages were considered stable, whereas if media sites reduced pages, 73% were seen as successful.
  21. Reviewing specific industries show publications and YMYL tried this and

    succeeded If you’re in • B2B • Medical • Style/fashion • Auto You may particularly benefit from reducing your pages - these industries, on average, performed better when they reduced pages than when they added more.
  22. B2B Median Page Reduction: -15.99% Median Traffic Increase: 103.36% Pixels.com

    Page reduction: -12.89% (961,615 → 837,710 pages) Traffic increase: +107.76% (339,573 → 705,502)
  23. Medical Median Page Reduction: -8.99% Median Traffic Increase: 57.11% Stability

    is remarkable - most of these sites have very low volatility (3-9%), indicating consistent growth rather than erratic traffic spikes.
  24. Fashion Median Page Reduction: -31.95% Median Traffic Increase: 93.34% Flaunt.com

    page reduction: -16.66% (10,945 → 9,122 pages) Traffic increase: +444.32% (48,033 → 261,451)
  25. Auto Median Page Reduction: -21.68% Median Traffic Increase: 54.09% whatcar.com:

    Gradual decline from ~13K to ~8.5K pages by end of 2024. Coincides with significant traffic growth.
  26. Less is more—if you’re not sure, be super targeted in

    what you consolidate Sweet spot: -10% to -20% reduction in content shows the highest traffic gains at 70.74%. Minimal page reductions (0-10%) produced substantial traffic gains (54.8%) with the highest stability rating (83.78%).
  27. Large sites (10K-100K pages) achieve dramatically higher traffic gains (77.67%)

    compared to smaller sites, despite reducing a smaller percentage of their content (-24.24%).
  28. Small sites typically require 34-51% page reduction; large sites achieve

    better results with only about 20% reduction Site Size Domains Average Traffic Gain Average Page Reduction Average Volatility Very Small (<100 pages) 10 47.01% -51.43% 25.36 Small (100-1K pages) 82 36.79% -35.85% 19.37 Medium (1K-10K pages) 158 46.55% -34% 20.39 Large (10K-100K pages) 121 77.67% -23.65% 27.33 Very Large (>100K pages) 20 57.59% -20.95% 28.33
  29. Based on the patterns I’m seeing, gradual, specific page reductions

    (likely content consolidation) are the more successful method to approach page reduction
  30. Where YMYL starts, the rest of the Internet will likely

    follow: plan to consolidate your content by 10-20% in the next 18 months.
  31. So how do we do this ourselves? Steps to implement

    to be on the more favourable end of reducing the pages on your website
  32. We know folks have failed doing this, so 1. Find

    and resolve duplicate content 2. Find and resolve irrelevant content 3. Map and match user intent 4. Consolidate any and all with proper redirects, 404’s or 410s 5. BONUS: E-E-A-T updates, particularly if in YMYL
  33. Find your duplicate content Do this at scale by using

    a combination of tools: • Screaming Frog to crawl URLs • BigQuery, Python and FAISS (Facebook AI Similarity Search - big list of similar embeddings)
  34. Or if you don’t have the time to do this

    programmatically yourself, use tools Common tools include: • The duplicate content flag in Screaming Frog (this assumes you’re able to crawl your entire website in one go) • “Duplicate, Google chose different canonical than user” report in Google Search Console • Site audits in SEMRush or Ahrefs or Siteliner (free up to 250 pages)
  35. Find your irrelevant content Analyse Search Console data at scale

    in BigQuery Define your terms for: • Brand • Product • Topics SELECT page, query, SUM(impressions) AS total_impressions, SUM(clicks) AS total_clicks FROM `your_project.your_dataset.gsc_data` WHERE NOT REGEXP_CONTAINS(LOWER(query), r'(yourbrand|brand|band)') AND NOT REGEXP_CONTAINS(LOWER(query), r'(product1|product2|topic1)') GROUP BY page, query HAVING total_impressions > 50 -- adjust thresholds as needed ORDER BY total_impressions DESC;
  36. But what if I’m not sure what my topics are?

    Use the topics report in SEMRush (or similar) for a direction
  37. If you have more brainspace than I do, you could

    do this dynamically by automated relevance scoring with your brand proposition copy analysed against your query dataset using classifyText from Google’s Natural Language API
  38. How Google classifies intent within the Search Quality Evaluator Guidelines

    • Know query, some of which are Know Simple queries • Do query, when the user is trying to accomplish a goal or engage in an activity • Website query, when the user is looking for a specific website or webpage • Visit-in-person query, some of which are looking for a specific business or organization, some of which are looking for a category of businesses Refresh your memory of the Search Quality Evaluator Guidelines: https://static.googleusercontent.com/media/guidelines.raterhub.com/en//searchqualityevaluatorguidelines.pdf
  39. A simple regex to apply to your queries to map

    user intent User Intent Regex Definition Visit-in-per son (?i)(near(\sby|\sme)?|directions(\sto)?|closest|nearest|local|address|hours|location) Website (?i)(.*\.(com|org|net|edu)|login|sign\sin|homepage|website) Do (?i)(how\sto\s.*|download|buy|purchase|get|watch|play|stream|calculate|sign\sup|inst all|create|make|build) Know Simple (?i)(what|who|when|where|how|why)\s.*\?|.*height.*|.*population.*|.*weather.*|.*salar y.*|.*distance.* Know (?i)((information|about|history|learn|guide|tutorial|facts?)\s.*|reviews?|news)
  40. Folk have dug even deeper into how Google actually classifies

    intent in it’s search engine Use this to classify your primary keyword set …if you want to use it for more than a hundred or so queries maybe pick up an AlsoAsked subscription or ping MW-C 👀 Or join the request for an API… https://rqpredictor.streamlit.app/ ; https://www.linkedin.com/posts/markseo_seo-activity-7298698401955627008-VRQ6
  41. And then once we classify the queries, we need to

    check if the topic aligns to the primary user intent for the query
  42. Unhelpful Content Low Brand Visibility Monetizes Clicks Poor UX Low-Effort

    Unpersonal Easy to Replicate Over-optimized SEO Helpful Content Good Brand Visibility Long-term Audience Unobtrusive UX High-Effort Personal Hard to Replicate Straightforward SEO
  43. Amanda King is human • Fifteen years in the SEO

    industry • Business- and product-focussed • Traveled to 40+ countries • Knows CRO, Data, UX • Always open to learning something new • Slightly obsessed with tea