Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Heros Not Villains - Modern SEO in the Generati...

Heros Not Villains - Modern SEO in the Generative IR Era

Dawn Anderson' talk from SEO Week in New York April 2025. SEO has long been considered in many ways adversarial to information retrieval but has evolved over the years into a valuable force for good, particularly in this modern age of generative information retrieval, whereby all sides of search are navigating challenging times and a huge paradigm shift. Here I talk about the value add that SEO brings to bridge the gaps between information retrieval and industry, developers, generic in-house marketeers and more and the opportunity available to us as well all move forward.

Dawn Anderson

April 30, 2025
Tweet

More Decks by Dawn Anderson

Other Decks in Marketing & SEO

Transcript

  1. At the heart of web search is the field of

    Information Retrieval (IR)
  2. What am I covering today? Outdated perception of SEO by

    IR field / search engineers 01 Why search results have really got so bad 02 How the IR field is trying to resolve this 03 How SEO can step up in the brave new search world 04
  3. Even before the day the term 'Adversarial information retrieval' (Broder,

    2000) was coined back in 2000 at TREC IR conference
  4. At AIRWeb ► An information retrieval conference track dedicated to

    adversarial information retrieval ► Ran between 2005 and 2009
  5. Is this an outdated view of SEO by the Information

    Retrieval field? ► Or are we all really just the same old spammers?
  6. This is not like the other Google shifts Voice search

    Mobile first indexing AMP BERT Page Layout HCU Reviews
  7. It's bigger than the many times in which Google has

    completely rearchitected their entire index with users barely noticing
  8. This is a total paradigm shift to Generative Information Retrieval

    and a change to the fundamentals of information retrieval
  9. This goes back to the days of library science and

    indexing – it's much bigger than Google
  10. Retrieval systems have followed the same paradigm for the past

    70 years (Even before search engines) User submits an information need A search is undertaken to identify suitable sources from a document corpus Results are presented to the user with a list of suitable sources ranked in relevance order and with snippets describing each result Prior to search engines this was predominantly in libraries / organisational systems With the advent of search engines this moved to '10 blue links'
  11. Classic IR - Complex Architecture Design Behind Search Engines IMAGE

    ATTRIBUTION: TANG, Y., ZHANG, R., GUO, J. AND DE RIJKE, M., 2023
  12. Machine learning has been part of Google Search for years

    already Re-ranking and late stage re-ranking Crawl scheduling via importance prediction and spam swerving Feature engineering optimisation (understanding which factors should work together) Pairwise, pointwise and listwise ranking comparisons Learning to rank Index pruning Loads more
  13. The Generative IR vision? "A sequence-to-sequence model is trained to

    directly map a query to its relevant document identifiers (i.e., docids)"
  14. One might describe this as Information Retrieval's existential crisis IS

    IT THE END FOR CLASSICAL INFORMATION RETRIEVAL?
  15. As in SEO, in IR there are many subdisciplines of

    the field including... NLP Recommender Systems Classical IR Ethics and Fairness in IR Medical IR Knowledge Graphs
  16. But it turns out NLP / LLMs are not very

    good on their own unchecked
  17. They hallucinate. Hallucation defined: "The generation of texts or responses

    that exhibit grammatical correctness, fluency, and authenticity, but deviate from the provided source inputs (faithfulness) or do not align with factual accuracy (factualness) (Ji et al., 2023)."
  18. Source: https://www.bbc.co.uk/news/world-us-canada-65735769 ChatGPT threw up a tranche of past precedent

    cases which simply didn't exist. Bogus quotes, bogus decisions, bogus citations provided by ChatGPT. Lawyer of 30 years claimed he was "unaware that its content could be false."
  19. Delphic Costs (Broder 2023) ► Search is not free ►

    Comes from the notion of visits in ancient times to Delphi to seek knowledge ► There is a cost to the user o Query reformulation cost o Time cost o Effort cost
  20. Delphic Costs & Making Domain Experts (Broder, Metzler et al)

    Bring the knowledge to the user Reduce friction Instead of sending the user to research for themselves
  21. “Our goal is to make search effortless by synthesising multiple

    related queries into one clear, personalised response.” (NYC Event)
  22. Pushback in IR Against Generative IR & 'Rethinking Search' ►This

    paper was considered a controversial paper not without its critics and push-back reviews when submitted for inclusion in a popular IR research conference.
  23. Shah and Bender (2024) In 'Envisioning Information Access Systems: What

    Makes for Good Tools and a Healthy Web?' Shah and Bender claim LLMs: "take away transparency and user agency, further amplify the problems associated with bias in [information access] systems, and often provide ungrounded and/or toxic answers that may go unchecked by a typical user." (Shah and Bender, 2024)
  24. Currently - None of this works without the stability of

    a reliable index – a stable memory
  25. EUROPEAN SUMMER SCHOOL INFORMATION RETRIEVAL ESSIR 2024 (Amsterdam) ► 42

    Things to Know About IR ► 16 other lectures covering machine learning, recommender systems and AI, learning to rank ESSIR 2017 (Barcelona) ► Crawling ► Indexing ► Music IR ► Image IR ► Conversational Search ► Search results evaluation ► Very web focused
  26. Generative Information Retrieval Closed book Information comes only from within

    the model Open book Information comes from within the model and also can come from external sources to provide grouding
  27. Grounding and RAG are key areas where classical IR and

    knowledge graphs folks can contribute to Generative AI Combines the strength and background of traditional information retrieval systems with the capabilities of generative large language models (LLMs). R.A.G. is the primary grounding technique.
  28. Grounding Connecting an AI model to verifiable sources of information

    Reduces probability of hallucination Important where accuracy and up-to-date information is critical
  29. At least 12 types of RAG Golden-Ret riever RAG Adaptive

    RAG Modular RAG Speculative RAG RankRAG Multi-Head RAG Original RAG Graph RAG LongRAG Self-RAG Corrective RAG Efficient RAG
  30. And Much of This Requires Assistance from SEOs Assisting with

    equiprobability disambiguation Recommending schema Implementing llms txt Monitoring for Bing indexing Ensuring pages are discoverable Advising around quality of content Communicating best practice to developers. Advocating for search engine tools. Jumping through hoops to implement 'new' search engine shiny objects
  31. Via... Internal links and anchors Semantically related and connected URLs

    Topic clustering Clear semantic structuring and logical library science approached architectures Crawlable (technically) and findable (connected) urls paths Shortened click paths to important pages (to the target audience) Schema implementation Extreme consistency
  32. Imputation i.e. Filling in the gaps in machine learning or

    statistics where there are missing features or attribute values. Clustering goes a long way to 'adding this extra context' via association in unsupervised machine learning.
  33. Between... ✔ Between search and developers ✔ Between search and

    brand C-Suite ✔ Between in-house marketing teams and search ✔ Between SMEs and search ✔ Between generic marketers and search
  34. Sending traffic to us and content creators may be considered

    a necessary evil BUT THEY CANNOT DO IT WITHOUT US... WE IMPACT THE QUALITY OF THE RECALL
  35. "Generative question answering is not the end of classical information

    retrieval, but rather opens up a new frontier of exciting and important research problems." (Najork - SIGIR 2023)
  36. The Whole GEO vs SEO debate In Information retrieval there

    is a new field – called Generation Information Retrieval but it's just another facet of information retrieval as a whole. It makes sense to appreciate that this is also a new facet of SEO. Let's not waste our time on cyclical debates which make no difference to progress
  37. References ► Broder, A.Z. and McAfee, P., 2023. Delphic costs

    and benefits in web search: A utilitarian and historical analysis. arXiv preprint arXiv:2308.07525. ► Davison, B.D., Najork, M. and Converse, T., 2006, December. Adversarial information retrieval on the Web (AIRWeb 2006). In ACM SIGIR Forum (Vol. 40, No. 2, pp. 27-30). New York, NY, USA: ACM. ► https://videolectures.net/videos/wsdm09_dean_cblirs ► https://static.googleusercontent.com/media/research.google.com/en//people/jeff/WSDM09-ke ynote.pdf ► Goodwin, D. (2025). Ex-Google exec: Giving traffic to publishers ‘a necessary evil’. [online] Search Engine Land. Available at: https://searchengineland.com/google-traffic-publishers-necessary-evil-453562 [Accessed 13 Apr. 2025]. ► Joren, H., Zhang, J., Ferng, C.S., Juan, D.C., Taly, A. and Rashtchian, C., 2024. Sufficient Context: A New Lens on Retrieval Augmented Generation Systems. arXiv preprint arXiv:2411.06037. ► Montti, R. (2025). Google Researchers Improve RAG With ‘Sufficient Context’ Signal. [online] Search Engine Journal. Available at: https://www.searchenginejournal.com/google-researchers-improve-rag-with-sufficient-context-si gnal/542320/ [Accessed 13 Apr. 2025].
  38. References (cont) ► Najork, M. SIGIR 2023 Keynote Generative Information

    Retrieval - https://docs.google.com/presentation/d/19lAeVzPkh20Ly855tKDkz1uv-1pHV_9GxfntiTJPUug/ ► Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A. and Schulman, J., 2022. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35, pp.27730-27744. ► Schultheiß, S. and Lewandowski, D., 2021. “Outside the industry, nobody knows what we do” SEO as seen by search engine optimizers and content providers. Journal of Documentation, 77(2), pp.542-557. ► https://searchengineland.com/google-traffic-publishers-necessary-evil-453562 ► Shen, J., Liu, J., Finnie, D., Rahmati, N., Bendersky, M. and Najork, M., 2023, April. “Why is this misleading?”: Detecting News Headline Hallucinations with Explanations. In Proceedings of the ACM Web Conference 2023 (pp. 1662-1672). ► Wang, X., Isazawa, T., Mikaelyan, L. and Hensman, J., 2024. Kblam: Knowledge base augmented language model. arXiv preprint arXiv:2410.10450.