Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lazarina Stoy - How to incorporate machine lear...

Lazarina Stoy - How to incorporate machine learning into your SEO day-to-day

Tech SEO Connect

October 23, 2024
Tweet

More Decks by Tech SEO Connect

Other Decks in Marketing & SEO

Transcript

  1. Lazarina Stoy. Founder of MLforSEO, Marketing Consultant How to incorporate

    ML into your SEO day-to-day A bit of ML theory, and a quickfire of ideas to implement ML in your daily routine (with templates ✨) October 17 - 18 2024 CAM Raleigh 409 W. Martin Street Raleigh, NC 27603
  2. 01 - What you really need to start 02 -

    Where you can implement ML APIs straight away, instantly We’ll cover… 03 - What you need to grow 🌱 How to incorporate ML into your SEO day-to-day
  3. @lazarinastoy | #techseoconnect To know • When to search for

    ML • What model to use • How to find suitable ML tools • What you can achieve in a short time-frame • How to drive value via ML
  4. @lazarinastoy | #techseoconnect ML You have labelled data to validate

    results You don’t have a way to validate results Find patterns and group based on similarity Simplify or transform your data Make predictions Split into groups, based on existing classes
  5. @lazarinastoy | #techseoconnect Self-Train A machine learning model you train

    from scratch, with your own data. Pre-Train А machine learning model that a third-party has trained. Fine-Tune А machine learning model that a third-party has trained, that you retrain, improve, adapt, fine-tune with your own data. You train the model further on a more specific dataset.
  6. @lazarinastoy | #techseoconnect Do you need consistent results, every time?

    ✅ ❌ Avoid unsupervised ML. Avoid generative AI. Avoid deep learning. (yeah, really)
  7. @lazarinastoy | #techseoconnect Do you need results to be easy

    to understand? Do you need to explain them to stakeholders? ✅ ❌ Skip deep learning. (yes, that includes LLMs)
  8. @lazarinastoy | #techseoconnect Is it okay that simply on average

    the output outperforms existing methods? ✅ Okay, then. Take a look at ML options.
  9. @lazarinastoy | #techseoconnect Assess usefulness of ML using multiple factors:

    • insights • complexity • accuracy • scalability • assets (data in = data out) • resources • bottom line
  10. @lazarinastoy | #techseoconnect Writing meta descriptions Data Characteristics Textual Page

    content Task characteristics Unsupervised • It is transformational (Page content to Page Summary in less than 160 characters) • It can also be generative (write them from scratch) Solution characteristics Mission critical? No Different results OK? Yes Explanation of process needed? Not really. Outperform current methods? Yes, much faster to get to a good enough result and satisfy a hygiene condition.
  11. @lazarinastoy | #techseoconnect Title / H1 Optimisations Data Characteristics Textual

    Page content Task characteristics Unsupervised • It is transformational (Page content to Page Summary in less than 60 characters) • It can also be generative (write them from scratch) Solution characteristics Mission critical? Could be, depending on the industry Different results OK? Could be critical for certain industries. Explanation of process needed? Sometimes. Outperform current methods? Yes, much faster to get to multiple first drafts, to pass onto an editor.
  12. @lazarinastoy | #techseoconnect Image captioning/ Alt tag generation Data Characteristics

    Image Image library Task characteristics Unsupervised Using a Pre-trained model image recognition model Generative AI / Image recognition Solution characteristics Mission critical? No Different results OK? Yes Explanation of process needed? Not really Outperform current methods? Yes, much faster.
  13. @lazarinastoy | #techseoconnect Quick check-in • Classification is a supervised

    machine learning approach. • It involves sorting data (documents, pages, keywords) into pre-labeled categories • Applications include - content audit, competitor audit
  14. @lazarinastoy | #techseoconnect With Google’s Natural Language API, you can

    classify documents in 1,300+ predefined categories
  15. @lazarinastoy | #techseoconnect Run the script via a formula to

    get classification label & confidence
  16. @lazarinastoy | #techseoconnect General purpose model Not efficient in large

    datasets, Unreliable (unpredictable) results Non-replicable - output different every time, even if data doesn’t change. Generative AI, trained on a wide variety of general-purpose text but isn't fine-tuned on specific datasets for text classification unless explicitly prompted Great for creative outputs Purpose-built API, specifically designed for tasks like text classification, sentiment analysis, and entity recognition Efficient and reliable Output same every time, unless data changes. Pre-trained Supervised ML model, trained on vast amounts of labeled data related specifically to text classification tasks Great for tasks that require precision
  17. @lazarinastoy | #techseoconnect Watch the details later. Iʼve recorded a

    step-by-step tutorial on doing topic modelling using a no-code, publicly-available, web-based app using LDA.
  18. @lazarinastoy | #techseoconnect • Interlink pages that mention the same

    subtopic more than 30% • Interlink pages mentioning the same or semantically related entities • Improve content categories and tag systems
  19. @lazarinastoy | #techseoconnect Possible data points • Your content •

    competitors’ content • Your competitors’ YouTube content • your SERP data versus competitors’ • first-party data • UGC • social mentions • Your internal link text anchors …
  20. @lazarinastoy | #techseoconnect Fuzzy matching is a quick and dirty

    way for calculating similarity between two strings
  21. @lazarinastoy | #techseoconnect WATCH THE DETAILS LATER Iʼve recorded a

    step-by-step tutorial on using fuzzy matching for things like: • Identifying link opportunities • String Similarity Analysis • redirect mapping of URLs
  22. N-gram analysis: Understanding language used within the high-performing articles for

    your terms can be beneficial for building content briefs.
  23. A more advanced use case of n-grams and language analysis

    to identify opportunities for Structured Data
  24. @lazarinastoy | #techseoconnect → Script tokenizes the text, discovers the

    questions, and pulls the answers → Script organises these into a schema dictionary, which is saved as a JSON file
  25. . The Content Moderation API module automatically analyzes text for

    inappropriate or undesirable content, helping you to maintain a clean and professional data set without manually reviewing each entry.
  26. @lazarinastoy | #techseoconnect Possible data points • Your content •

    Competitors’ content • YouTube video transcripts (monetization) • social media mentions • comments on your website • community posts …
  27. @lazarinastoy | #techseoconnect Approach Suitable for Limitation Tools No-code •

    Beginners • Non-technical • Limited scalability Programmatic • Intermediate • A little bit more technical • API-savvy • Time • Adoption costs - learning, dev resources • Google Cloud Speech-to-Text • Amazon Transcribe • OpenAI Whisper - but bear in mind it sucks for anything over 2-3 minutes, and has no small language support.
  28. @lazarinastoy | #techseoconnect What I’m not saying ❌ What I

    am saying ✅ • Spam your blog with auto-transcribed content • Fire your content team • Scrape competitorsʼ Youtube videos, transcribe, and traffic to the moon, bro 🥲 • Bridge gaps between different teams, if enterprise • Make content work harder, especially if youʼre already producing webinars, live streams, etc. • Use transcription for competitor analysis, not copying
  29. @lazarinastoy | #techseoconnect Scrape videos/audio Transcribe Context: • SERP Analysis

    shows Google ranks more videos, but there is still lots of traffic for web pages. • There are some YouTubers that donʼt do a good job at blogging, but their channels get a ton of traction Identify topics + subtopics Extract entities • Compare with website • Add to topical map • Find and address information gaps
  30. @lazarinastoy | #techseoconnect You have a library of high-performing tutorials

    but no presence on YouTube/TikTok? →Scale production with text to speech.
  31. @lazarinastoy | #techseoconnect Approach Suitable for Limitation Tools No-code •

    Beginners • Non-technical • Limited scalability Programmatic • Intermediate • A little bit more technical • API-savvy • Time and other adoption costs • Google Cloud Text-to-Speech API • Amazon Polly - Text To Speech AI Tool • OpenAI GPT4o (soon)
  32. @lazarinastoy | #techseoconnect What I’m not saying ❌ What I

    am saying ✅ • Spam YouTube with AI generated trash • You can replace video production • Certain content formats donʼt require video and can be made more accessible via audio & stills, like interviews or tutorials • To add automation, but still have at least some personalised look/feel, tutorial videos might have avatars, instead of person-to-camera setups to lower costs
  33. @lazarinastoy | #techseoconnect Approach Suitable for Limitation Tools No-code •

    Beginners • Non-technical • Limited scalability • ChatGPT • Custom GPTs • Web tools (they’re all wrappers of GPT, so not worth it) Programmatic • Intermediate • A little bit more technical • API-savvy • Time and other adoption costs • GPT4/ GPT4o • Any LLMs • BERT
  34. @lazarinastoy | #techseoconnect You have a library of high-performing blog

    posts but no content distribution? →Transform blog posts to insightful posts for social media.
  35. @lazarinastoy | #techseoconnect You have a library of high-performing blog

    posts but no newsletter? →Use an LLM to rewrite these into newsletter edition drafts. You have comprehensive guides or reports in PDF format? →You can extract key insights, summaries, or actionable tips from these documents and repurpose them into blogs or social posts/ threads.