Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The AI Tech SEO Compendium: Augmenting technical SEO tasks using ML & AI - SMX Munich 2024

The AI Tech SEO Compendium: Augmenting technical SEO tasks using ML & AI - SMX Munich 2024

A lot is changing right now, with the introduction of brand-new AI-powered tools. In this session at SMX Munich 2024 I showed how to use state-of-the-art AI in combination with, e.g. vector databases, Custom GPTs, etc. to reliably automate select parts of your SEO work – like redirects and their mapping, and internal & external linking – to improve your SEO performance.

Bastian Grimm

March 11, 2024

More Decks by Bastian Grimm

Other Decks in Marketing & SEO


  1. The AI Tech SEO Compendium Bastian Grimm, Peak Ace AG

    | @basgr Augmenting technical SEO tasks using ML & AI
  2. 4 peakace.agency Winning with proven AI use cases (in marketing)

    Source: https://pa.ag/4bLPplv 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹-𝗔𝘂𝗴𝗺𝗲𝗻𝘁𝗲𝗱 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 Up to 95% accuracy automating answers 𝗦𝘂𝗺𝗺𝗮𝗿is𝗮𝘁𝗶𝗼𝗻 Up to 40% productivity gains in front and back office functions 𝗖𝗼𝗻𝘁𝗲𝗻𝘁 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 Up to 40% cost savings in content creation 𝗡𝗮𝗺𝗲𝗱 E𝗻𝘁𝗶𝘁𝘆 R𝗲𝗰𝗼𝗴𝗻𝗶𝘁𝗶𝗼𝗻 Up to 90% reduction of text reading and analysis work 𝗜𝗻𝘀𝗶𝗴𝗵𝘁 𝗘𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻 Up to 80% faster in processing data 𝗖𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻 Up to 30% cycle time reduction in customer service support
  3. Large Language Models (LLMs) are AI systems trained on vast

    datasets (thus “large”) to understand, predict and generate data using transformer-based neural networks. Simply put:
  4. 9 peakace.agency A comprehensive overview of Large Language Models And

    these are just some of the "bigger“ noteworthy LLMs being released until the end of 2023: Source: https://pa.ag/4cdB55B
  5. 11 peakace.agency Information Retrieval and Analysis LLMs can sift through

    large volumes of text data to extract relevant information, summarise key points, and answer questions, making them valuable for research, data analysis, and decision-making support. Personalised Recommendations LLMs can analyse user preferences and behaviour to provide personalised recommendations, such as articles or products, thus enhancing UX and engagement. Natural Language Processing LLMs excel in understanding language, making them ideal for applications such as chat bots, language translation, sentiment analysis, and text summarisation. What are LLMs good at?
  6. 13 peakace.agency Understanding Context Beyond Training Data LLMs may not

    perform well in situations requiring an understanding of context or knowledge beyond their original training data set. Making Ethical or Moral Judgments LLMs lack the ability to make ethical or moral judgments and should not be used in situations where such considerations are crucial. Most LLMs’ decisions are also biased. Limited Understanding and Reasoning LLMs can't form a chain of logical conclusions, instead they’re following probability rules; even if the most common answer to a question is irrational or outright wrong, it will still provide said answer. What are LLMs NOT good at?
  7. 14 peakace.agency LLMs are also not good at creating original

    content LLMs don’t “write” anything. They generate text based on probabilities and the number of parameters used in their training, using content they've encountered before.
  8. … I'm not going to bother you with more prompts

    for ChatGPT and how to speed up your SEO. I mean everyone's doing that by now anyway, right? Don‘t you worry…
  9. 19 peakace.agency There are tons of commercial AI solutions available

    From ChatGPT, Azure AI to NVIDIA‘s AI Platform and IBM watsonx – everyone in big tech is offering "something“:
  10. 21 peakace.agency How Google’s February ’24 went – in a

    nutshell Source: https://pa.ag/3uQ9dU1
  11. 22 peakace.agency ICYMI: OpenAI announced Sora Sora is currently only

    accessible for red team members – experts in areas such as misinformation, hateful content, and bias – to examine critical areas for potential problems or risks, however the preview is quite impressive: Sources: https://pa.ag/3IcBJm3 & https://pa.ag/4a7V2cb & https://pa.ag/3V1V2pw The excitement from the press has been reminiscent of the buzz surrounding the image creator DALL-E or ChatGPT: Sora is described as “eye-popping,” “world- changing,” and “breath- taking, yet terrifying.”
  12. 23 peakace.agency Back to Google: say goodbye to Bard and

    hello to Gemini – Google’s AI chat bot gets a new name
  13. 24 peakace.agency There's more! Gemini is a family of multimodal

    LLMs developed by Google DeepMind Multimodality Input/output using multiple formats (e.g., text, audio, video, gestures, etc.) Reinforcement learning Drastically reduce hallucinations 3rd party integrations High efficiency when using external tools and API integrations Memory capabilities Build and expand the knowledge bank while the model learns
  14. 25 peakace.agency Unsurprising to see this after the Hugging Face

    deal? Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology Google used to create the Gemini models: Sources: https://pa.ag/3T9q0cK & https://pa.ag/4akEihZ
  15. 26 peakace.agency Also, a variety of (free) open-source models are

    available Hugging Face’s Open LLM Leaderboard aims to track, rank and evaluate open source LLMs on different benchmarks: Source: https://pa.ag/3L2qUEV
  16. Despite not being quite as powerful (yet), they are available

    to download, customise and self-host. The beauty of these?
  17. 28 peakace.agency But where to start, and which LLM to

    use? From LLaMa 2, Falcon to Dolly 2.0 and MPT or Bloom – the choice is yours (yep, I know… overwhelming much?) Source: https://pa.ag/3Td5ucz LLaMa 2 A well-performing open source LLM (with license for commercial use) that encompasses pre-trained and fine-tuned generative text models with 7 to 70 billion parameters. Vicuna & Alpaca Use the LLaMa model as basis and (like Google’s Bard and OpenAI’s ChatGPT) are fine-tuned to follow instructions. Vicuna matches GPT-4's performance. Falcon LLM Can be used with chat bots to generate text, solve complex problems and reduce and automate repetitive tasks. Falcon 6B & 40B are available as raw models for fine-tuning.
  18. 29 peakace.agency How to host and run your private LLM?

    Easy… let’s just ask ChatGPT how to do it, shall we?
  19. 30 peakace.agency LM Studio: Discover, download, and run local LLMs

    Run an AI on your desktop using locally installed open-source Large Language Models (LLMs) for free! Source: https://pa.ag/3UW0Dh7 With LM Studio, you can... ▪ Run LLMs on your laptop, even while offline (Win, Mac & Linux) ▪ Use models through the in-app UI or an OpenAI-compatible local server ▪ Download any compatible model files from Hugging Face repositories ▪ Discover new LLMs in the app
  20. 31 peakace.agency My favourite: Ollama – get LLMs up and

    running, locally Command line only. Use the PageAssist Chrome plug-in (a web UI for local LLMs) to control Ollama, including model pulls, configuration, and running LLM dialogues/chats. Sources: https://pa.ag/48A07se & https://pa.ag/48xAnNn Pro tip: Ollama runs at by default (and offers APIs as well)
  21. 32 peakace.agency Want to try for yourself, but you’re not

    a developer? Solutions such as stack or LLMStack offer no-code DIY approaches by connecting and combining a variety of data sources through APIs and other endpoints including LLMs. Sources: https://pa.ag/3wu1UlA & https://pa.ag/3UXzY3K
  22. 33 peakace.agency Peak Ace’s current favourites: balancing speed & scalability

    A small selection of platforms that we feel are most convenient to start with. If you’d like to chat about it – come meet the Peak Ace team outside at our SMX booth! More complex, but worth checking out if you’re into this stuff: Hugging Face LLM Inference Container for Amazon SageMaker
  23. 34 peakace.agency Keep in mind: There are risks that need

    to be managed (Obviously, this is true for both commercial and open-source models) Source: https://pa.ag/3Td5ucz Consent Ensuring training data was gathered with accountability, meaning it follows AI governance processes (compliant w/ laws & regulations) Security Security problems can include data leaks or cyber criminals using the LLM for a variety of malicious tasks Bias Happens when the data source is not diverse or representative enough Hallucinations Can result from the LLM being trained on incomplete, contradictory, or inaccurate data, or from predictions in general
  24. 35 peakace.agency Will hallucinations ever disappear? "It’s inherent in the

    mismatch between the technology and the proposed use cases," says Emily Bender, professor in the Department of Linguistics and director of the Computational Linguistics Laboratory at the University of Washington. Source: https://pa.ag/3PqP0Mh LLMs are designed to predict the next word – of course there will be cases where the model is wrong.
  25. 36 peakace.agency Source: https://pa.ag/3MMah0X Life or Death: AI-generated mushroom foraging

    books are all over Amazon; Experts are worried that books produced by ChatGPT […] which target beginner foragers, could end up killing someone. Terry Pratchett peakace.agency This can REALLY go wrong…
  26. Simply put, RAG integrates LLMs with external databases or APIs,

    thus enabling real-time information retrieval for up-to-date and more accurate responses. So, what‘s the deal?
  27. 40 peakace.agency The conceptual flow of using RAG with an

    LLM RAG can be used to enhance the accuracy and reliability of gen AI models with facts fetched from external sources. Source: https://pa.ag/3STryHY Generated text response 5 Large Language Model EndPoint Prompt + Query + Enhanced context 4 Prompt + Query 1 Knowledge sources Search relevant information Query 2 Relevant Information for enhanced context 3
  28. 41 peakace.agency Why is RAG better/more efficient than other approaches?

    Because it can handle noisy or irrelevant information, refrain from answering when there is insufficient knowledge and integrate with a variety of different sources simultaneously. Source: https://pa.ag/3TdUgoc
  29. 42 peakace.agency The Self-RAG framework enhances LLM quality & factuality

    Self-RAG improves the output quality of LLMs by integrating retrieval, generation, and self-critique mechanisms. Source: https://pa.ag/3QP6MZ4 Self-RAG’s approach is to selectively retrieve relevant information and critique both the retrieved content and its own outputs, offering a more refined performance across various tasks compared to existing models.
  30. 43 peakace.agency Some real-world RAG use cases we’ve built in

    recent months Some of the most common cases we’ve seen and worked on for our clients over the last months: Chatbot Use RAG to incorporate LLMs into Q&A chat bots allowing for more accurate answers based on data from company documents. Knowledge engine Ask questions based on your data to provide context for LLMs and greatly increase the quality and accuracy of answers. Search augmentation Incorporate LLMs into onsite search (engines) and augmenting the results with LLM-generated answers/content leading to higher quality results.
  31. If you tie it all together (tools, databases, APIs, models,

    etc.) you can build some REALLY cool stuff!
  32. 47 peakace.agency Collect URL inventory with a crawling tool… ...

    and then somehow (usually manually) align it with the new target structure (depending on the respective content) and generate the redirect mapping file from this. …or any other crawler of choice. …or Google Sheets. Creating 1-to-1 redirect mappings for old content is often done in Excel. Then attempts are made to manually assign titles, headings or URLs.
  33. 51 peakace.agency Embeddings and vector database = redirect win Necessary

    steps for better automated redirects (and an improved customer journey): Extract main content of every (old) site/URL Generate embeddings Save together with metadata in vector database Semantic search in vector DB based on embeddings of old URLs
  34. 53 peakace.agency I got 99 problems but AI ain’t one…!

    (at least for now) Grab one outside, expo hall, booth #1 (ground floor) – see you there!
  35. Word embeddings are numerical vectors representing words, capturing their meanings

    and relationships in a multidimensional space. What are (word) embeddings?
  36. You can convert any word into a vector and start

    calculating with them: "king" minus "man" plus "woman" equals "queen". Synonyms and more can also be found this way. What are (word) embeddings?
  37. A vector DB utilises data embeddings as index, facilitating fast

    and scalable searches among unstructured data points, enhancing efficiency in retrieving similar items or information. What about vector databases?
  38. A vector DB allows you to find matches between anything

    and anything (e.g., use an image as a query to find similar pieces of text, video, other images, etc.). Simply put:
  39. 59 peakace.agency Extracting the main content of every old URL

    <title> tag <h1>s each first & last sentence <p> <h2>s <h2>s Combine everything Content = Title + h1 + h2s + … ▪ Extract: <title> + main content ▪ Combine: <title>, <h1>, <h2>s and first & last sentence of each paragraph
  40. 60 peakace.agency Generate embeddings and store in vector database For

    each website URL: ▪ Transfer previously generated content to vector DB ▪ Generate embeddings (BERT, GloVe, FastText) ▪ Save embeddings in a vector DB incl. metadata (URL, title, etc.) Content Content Content 0.03 … 0.19 -0.21 … 0.03 0.08 … -0.15
  41. 61 peakace.agency Search the vector database for the best semantic

    match For every outdated page: ▪ Vectoric semantic search for KNN (k-nearest neighbour) ▪ Set 301 to NN URL ▪ No more weak redirects ▪ Play with certainty/ temperature settings 0.31 … -0.41 {Get { Article ( nearVector: { limit: 1, content: { vector:[embedding], certainty: 0.8 } } ) { url } }} Future 404
  42. 64 peakace.agency State-of-the-art sentence embeddings are the gold standard The

    Levenshtein distance (basic fuzzy matching) provides an alternative, as we’re mainly dealing with small text snippets and minimal deviations between URL versions: Source: https://pa.ag/49RHG3y The more substantial the changes between two versions, the higher the likelihood that you’ll reap significant benefits from leveraging sentence transformers. h/t Will Nye for the data set
  43. As with most things, it can boost efficiency, but it

    isn't a complete replacement for a human.
  44. Analyse page contents and automatically create redirect maps based on

    two (old vs new) SF crawls. Facebook AI Similarity Search (FAISS)
  45. 73 peakace.agency Automated redirect matchmaker for site migrations Fantastic script

    by Daniel Emery utilising two SF crawls (origin + destination.csv with titles, metas, URLs and headings) to perform a fast semantic search (using sentence transformers) and create a redirect map: Sources: https://pa.ag/4bWAgxy & https://pa.ag/3USteUJ FAISS is an outstanding library designed for the fast retrieval of nearest neighbours in high- dimensional spaces. It enables quick semantic nearest neighbour searches even on a large scale.
  46. You can use the same approach e.g., for much better

    internal linking as well as reverse content gap analysis. This doesn’t only work for redirects…
  47. 78 peakace.agency Cloudflare Workers to execute redirects on CDN/edge level

    I already spoke about using CF Workers for a variety of technical SEO tasks including redirects at the SMX Advanced in Berlin back in 2021. Looking to dive deeper? Make sure to grab a copy of the deck: Source: https://pa.ag/4bSxauE Pro tip: this rarely requires dev resources; either you can do it yourself, or sys ops (less busy)
  48. 80 peakace.agency Naturally, Cloudflare is all in on AI as

    well… Build and deploy AI applications to CFs global network: all it takes is a few lines of code with Workers AI to run an AI task using the Workers framework (or any other stack via API): Source: https://pa.ag/3IgVBV6
  49. 81 peakace.agency Workers AI – an AI inference as a

    service platform Empowering developers to run well-known AI models with just a few lines of code on serverless GPUs, all on CFs trusted global network: Source: https://pa.ag/3Tgqlfh TL;DR: using the LLM of choice without having to worry about hosting, deployment, scale, …
  50. 82 peakace.agency But it doesn‘t stop there: meet Vectorize Use

    Vectorize to power e.g., semantic search, etc. directly with Workers, improve accuracy and context of answers from LLMs, and/or bring-your-own embeddings from other platforms, including OpenAI and Cohere: Sources: https://pa.ag/49Rys7u & https://pa.ag/3wq2AIr
  51. Custom GPTs are a way to create tailored, custom versions

    of ChatGPT that combine instructions, extra knowledge, and any combination of skills. What are Custom GPTs (for ChatGPT)?
  52. 87 peakace.agency A Custom GPT in its simplest form: Using

    Peak Ace’s Structured Data GPT to debug and fix errors in JSON-LD mark-up
  53. 89 peakace.agency Unveiling Peak Ace’s GPT Suite Source: https://pa.ag/peakace-gptsuite SEO

    Writing Assistant For keyword analysis, SEO content checks, readability assessments, competitor analyses, multilingual support, mobile optimisation, and more: https://pa.ag/seo- writing-assistant Outreach Hero For crafting unique email templates, engaging subject lines, clear messages and more: https://pa.ag/ outreach-hero PPC Performance Analyzer For data analysis and adaptability, optimisation suggestions and more, all with perfect confidentiality: https://pa.ag/ppc- performance-analyzer Structured Data GPT For analysing and troubleshooting structured data for SEO, optimisation suggestions, technical implementation support, and more: https://pa.ag/ structured-data
  54. 92 peakace.agency Making GPTs smarter with external data A Custom

    GPT to connect with the DataForSEO API to allow for real- time access to actual search data:
  55. Well, no… the (training) data is insufficient and/or outdated, numbers

    are either non-existent or completely made up. ChatGPT can do this out of the box, can’t it?
  56. Here‘s a quick three-step guide on how to DIY it.

    So, how can you build this yourself?
  57. 96 peakace.agency #1 Provide basic info to get started (name,

    description, …) Login to ChatGPT > choose Explore GPTs > Create (you need ChatGPT Plus) Well defined instructions are key, think prompting.
  58. 97 peakace.agency #2 Create an ‘Action’ to call a 3rd

    party API Head to your API provider and grab your credentials. In our case this was the API Dashboard at DataForSEO.com: Get the OpenAPI Schema for DataForSEO: https://pa.ag/3Pa7oZ3 To use with an action, you need to generate a base64- encoded version of your login credentials: btoa(‘APIemail:APIpass’) The annoying part: you need a Schema according to the OpenAPI spec. But no one reads docs anymore – we just leverage ChatGPT to do this:
  59. Remember: APIs usually aren‘t free, so make sure you only

    publish your new Custom GPT for yourself! #3 Test and publish your GPT
  60. Just reauthenticate (base64-encoded version of your login). You also need

    a new schema (again based on OAS spec). Customisation for other APIs is easy (e.g., Sistrix, etc.)
  61. 100 peakace.agency Did you know? You can link using pre-filled

    prompts! You can also link directly to pre-filled prompts and execute them – which works for both Custom GPTs and GPT-4 models. Simply add the query string (using “q=xxx“) to the end of your ChatGPT URL. Source: https://pa.ag/crsum 𝗙𝗼𝗿 any C𝘂𝘀𝘁𝗼𝗺 𝗚𝗣𝗧 𝗮𝗱𝗱: ?q=your+prompt+goes+here 𝗙𝗼𝗿 the 𝗚𝗣𝗧-𝟰 𝗯𝗮𝘀𝗲 𝗺𝗼𝗱𝗲𝗹: ?model=gpt-4&q=your+prompt Use directly in your Chrome browser
  62. 101 peakace.agency When to use a Custom GPT? Long-term context

    Custom GPTs are a really powerful tool to ensure instructions remain contextualised over long periods of time. Besides seamless 3rd party data integration, my top-3 reasons why building and using Custom GPTs can help a lot: Building workflows Custom GPTs are best suited for composing workflows aimed at people who don’t know how to properly design context with prompt sequences. Sharing instructions For sharing the exact same instructions e.g., cross-team, without having to worry about specifying them (and how) at prompt level.
  63. 102 peakace.agency BTW: not compatible & very different… GPTs for

    MS Copilot ChatGPT has almost completely replaced plugins with GPTs. On Copilot, plugins call on external services. However, Copilot GPTs are a conversation with a specific goal: Source: https://pa.ag/3wyCrr0
  64. 103 peakace.agency Copilot + Excel: taking data analysis to a

    whole new level! Super pumped for this, as it’ll enable just about anyone to question, analyse, visualise and refine data effortlessly: Source: https://pa.ag/49YhhBe
  65. 105 peakace.agency Looking to learn more about AI this year?

    Some new (and free) AI courses and resources to help you boost your AI knowledge: Sources: https://pa.ag/48NKLkk / https://pa.ag/4c2sa6U / https://pa.ag/48FldFJ / https://pa.ag/4c2smTG / http://pa.ag/ai
  66. Want to chat about AI & grab a t-shirt? Meet

    Peak Ace in the expo hall, booth #1 (ground floor) = https://pa.ag/smx24