Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The untapped power of vector embeddings

The untapped power of vector embeddings

On brightonSEO, Frank tells how to work with vector embeddings as an SEO specialist. He tells you what they are, how you can use them to automate your work and lays out some cases with vector databases.

Frank van Dijk

April 11, 2025
Tweet

More Decks by Frank van Dijk

Other Decks in Marketing & SEO

Transcript

  1. 6 @frankvndijk What are embeddings? “Embeddings are numerical representations of

    data (like words, images, or audio) in a multi- dimensional space” Images Audio Text Embedding model 0.9 0.7 0.2 0.6
  2. 9 @frankvndijk -0.2 cat Embedding model 0.9 0.7 -0.3 0.6

    dog Embedding model 0.9 0.6 -0.2 0.8 pet Embedding model 0.9 0.7 -0.2 0.9 lion Embedding model 0.9 0.2 0.8
  3. 12 @frankvndijk What is cosine similarity? “Cosine similarity measures the

    angle between two embeddings in a multi-dimensional space to determine how similar they are” cat dog
  4. 13 @frankvndijk 0 1 Identical No similarity Similarities really arise

    Cosine similarity Always a score between 0 (or -1) and 1
  5. 19 @frankvndijk “We no longer live in a keyword era,

    but in an era of search intent” since 2013…
  6. 20 @frankvndijk What is a vector database? "A vector database

    stores data as high- dimensional vectors for efficient similarity searches and AI applications." 0.4 0.8 0.3 0.7 0.3 0.7 0.4 0.6 0.5 0.6 0.5 0.5
  7. 22 @frankvndijk 128.000 tokens context window +/- 96.000 words 4.096

    token limit +/- 3.000 words Limits of ChatGPT
  8. 25 @frankvndijk This has major advantages It helps to prevent

    hallucinations Have control over what data you use Use real time or new data
  9. 30 @frankvndijk Embedding models from OpenAI text-embedding-ada-002 text-embedding-3-small text-embedding-3-large Released

    December 2022 1536 dimensions Released January 2024 1024 dimensions Released January 2024 3072 dimensions *Source: benchmark from datacamp.com
  10. 32 @frankvndijk Correct settings in SF Make sure that your

    crawl configurator is set properly: • Extraction => Store Rendered HTML • Rendering => JavaScript
  11. 33 @frankvndijk Connect with OpenAI Make the connection with OpenAI

    in your crawl: • Add your API from OpenAI • Choose the “Extract embeddings form page content” template
  12. 34 @frankvndijk Visible in your crawl Next, the embeddings will

    be visible in your crawl: • Go to the AI tab • Scroll to the embeddings
  13. 36 @frankvndijk 0.4 0.8 0.3 0.7 0.8 0.4 0.3 0.2

    0.2 0.7 0.9 0.1 0.1 0.5 0.3 0.9
  14. 37 @frankvndijk Three scripts for embeddings Internal link opportunities Redirect

    mapping Duplicate content analyses I will give them away
  15. 38 @frankvndijk SF crawl with embeddings Checking cosine similarity Checking

    existing link in HTML Internal link recommendations Webpage embedding Rest of the embeddings Internal link opportunities Checking similarity Checking relevancy Gathering pages
  16. 39 @frankvndijk Checking similarity Checking relevancy Gathering pages Gather the

    pages we want to optimise Gather the pages we want to optimize
  17. 40 @frankvndijk Check cosine similarity Checking if the similarity is

    at least 0.85 Checking similarity Checking relevancy Gathering pages
  18. 41 @frankvndijk Checking similarity Checking relevancy Gathering pages Check for

    in content link Checking for potential link in HTML of page
  19. 42 @frankvndijk Checking similarity Checking relevancy Gathering pages Looping through

    all pages Looping through this steps to find all recommendations
  20. 45 @frankvndijk Use my Google Colabs to run them Give

    it your input Run the script Download the results
  21. 47 @frankvndijk Case A client was not present in the

    informational and orientation phase of the customer journey Solution Creating content based on a semantic search in a vector database
  22. 48 @frankvndijk Blog subject Writing process Extract data Blog content

    DB search Finding the right blog subjects Searching for our products in combination with ‘best’
  23. 49 @frankvndijk Blog subject Writing process Extract data Blog content

    DB search Semantic searches for products that match Getting the products that matches the blog subject
  24. 50 @frankvndijk Blog subject Writing process Extract data Blog content

    DB search Getting the product information Getting the products that matches the blog subject
  25. 51 @frankvndijk Blog subject Writing process Extract data Blog content

    DB search Write content With AI we managed to make a concept of the blog content
  26. 52 @frankvndijk Case Another clients website didn’t appear in the

    AI overviews for key keywords, while competitors did Solution Analyze and predict with embeddings and other things when content could be displayed
  27. 53 @frankvndijk Keywords & URLs Write Rewrite Update content Validation

    Optimized content Prediction Scrape the AI overviews Scraping content shown in the AI overview
  28. 54 @frankvndijk Keywords & URLs Write Rewrite Update content Validation

    Optimized content Prediction Scraping competitors Scraping the content of the competitor shown
  29. 55 @frankvndijk Keywords & URLs Write Rewrite Update content Validation

    Optimized content Prediction More data for the prediction Gathering other relevant data that is important
  30. 56 @frankvndijk Keywords & URLs Write Rewrite Update content Validation

    Optimized content Prediction Prediction partially based on embeddings We predict if the content is capable to be shown
  31. 57 @frankvndijk Keywords & URLs Write Rewrite Update content Validation

    Optimized content Prediction Optimization advice We (re)write content so we are capable to be shown
  32. 58 @frankvndijk Results After implementing: • Before: 32.63% of AI

    Overviews contained a link • After: 54.48% of AI Overviews contained a link +67% +49% Increase in clicks Increase in display of links
  33. 59 @frankvndijk Join the embeddings movement Start experimenting with embeddings

    and discover what’s possible. This is where the future begins
  34. 60 @frankvndijk Key takeaways What embeddings are and how they

    help us as SEO specialists How to automate an internal link audit and redirect mapping The handles for building a vector database and linking it to an LLM 01. 02. 03.