Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Google BERT - SMX London - What SEOs Need to Know

Google BERT - SMX London - What SEOs Need to Know

Dawn Anderson

November 21, 2024
Tweet

More Decks by Dawn Anderson

Other Decks in Marketing & SEO

Transcript

  1. @dawnieando • A Google algorithmic update • Google announce BERT

    to the organic search world in a VERY geeky way • Mentions of the 15% of new queries every day • Touches on ‘The Vocabulary Problem’ (many ways of querying the same thing) October 2019 - Welcome To Search, BERT
  2. @dawnieando • Probably the biggest improvement in search EVER •

    The biggest change in search in five years, since RankBrain Fundamentally… Google BERT is
  3. @dawnieando !Layman’s Terms: it can be used to help Google

    better understand the context of words in search queries & content So, just what is Google BERT update?
  4. @dawnieando • Used globally in all languages on featured snippets

    • BERT to impact rankings for 1 in 10 queries • Initially for English language queries in US The bottom line search announcement
  5. @dawnieando Dec 2019 – BERT expands internationally • Over 70

    languages • Still only impacts 10% of queries despite the considerable expansion • Still all featured snippets globally
  6. @dawnieando • BERT deals with ambiguity & ‘nuance’ in queries

    & content • Unlikely to impact short queries • More likely to impact conversational queries • Unlikely to impact branded queries Why just 10% of Google Queries Impacted?
  7. @dawnieando • The SEO community is abuzz • BERT is

    a big deal • Likened to ‘Rank Brain’ in some of the ‘interesting’ interpretations • Some confusions around ‘What BERT is and what it means for search’ SEO’s React
  8. @dawnieando !A neural network-based technique for natural language processing pre-training

    !An anagram of Bi-Directional Encoder Representations from Transformers BERT in Geek Speak
  9. @dawnieando • Search algorithm update • Open source pre-trained model

    / framework for natural language understanding • Academic research paper • Evolving tool for computational linguistics efficiency • Beginning of MANY BERT’ish language models Important: BERT is Many Things
  10. @dawnieando • Academic Paper • Research Project by Devlin et

    al • Published a year before the update in October 2018 • Bert: Pre-training of deep bidirectional transformers for language understanding BERT started as a research paper in 2018
  11. @dawnieando • Open sourced so anyone can build a BERT

    • BERT created a sea-change leap-forward in natural language understanding in information retrieval very quickly • Provided a pre-trained language model which required only fine- tuning BERT Open Sourced in 2018
  12. @dawnieando The whole of the English Wikipedia & The Books

    Corpus combined. Over 2,500 million words BERT Has Been Pre-Trained On Many Words
  13. @dawnieando Vanilla BERT provides a pre- trained starting point layer

    for neural networks in machine learning & natural language diverse tasks The machine learning community got very excited about BERT
  14. @dawnieando • BERT is fine-tuned on a variety of downstream

    NLP tasks, including question and answer datasets BERT Can Be Fine-Tuned in A Short Space of Time
  15. @dawnieando • Vanilla BERT can be used ‘out of the

    box’ or fine-tuned • Provides a great starting point & saves huge amounts of time & money • Those wishing to, ‘can build upon’, and improve BERT BERT Saves Researchers Time AND Money
  16. @dawnieando • Microsoft – MT-DNN • Facebook – RoBERTa •

    XLNet • ERNIE – Baidu • Lots of other contenders Since 2018 Major tech companies extend BERT
  17. @dawnieando You think SEOs are competitive? ML Engineers are more

    so • GLUE • SuperGLUE • MSMARCO • SQuAD …And Leaderboards
  18. @dawnieando Language models like BERT help machines understand the nuance

    in word’s context and surrounding text cohesion What Purpose Does BERT Serve & How?
  19. @dawnieando • Dates back over 60 years old to the

    Turing Test paper • Aims at understanding the way words fit together with structure and meaning. • NLU is Connected to the field of linguistics (computational linguistics) • Over time, increasingly computational linguistics overflows to a growing online web of content What is Natural Language Understanding?
  20. @dawnieando • Natural language understanding requires: • Word’s context •

    Common sense reasoning Natural Language Recognition is NOT Understanding
  21. @dawnieando Humans mostly understand nuance and jargon from multiple meanings

    in written and spoken word because of ‘context’ Humans ‘Naturally’ Understand Context
  22. @dawnieando • Synonymous • Polysemous • Homonymous But Words Can

    Be VERY Problematic for Machines & Sometimes Even for Humans
  23. @dawnieando “The meaning of a word is its use in

    a language” (Ludwig, Wittgenstein, Philosopher, 1953) Image attribution: Mortiz, Nahr (Public domain) Single Words Have No Meaning
  24. @dawnieando The word ‘like’ in this sentence, is both a:

    !(VBP) : (‘verb’ (non 3rd-person, singular, present) ) !(IN) : (Preposition or subordinating conjunction) An Example of Word’s Meaning Changing • I -> PRP • Like -> VBP • That -> IN • He -> PRP • Is -> VBZ • Like -> IN • That -> DT
  25. @dawnieando E.g. Verbs, nouns, adjectives • Penn-treebank tagger -> 36

    different parts of speech • CLAWS7 (C7) -> 146 different parts of speech • Brown Corpus Tagger -> 81 different parts of speech Words Are ‘Part of Speech’ When Combined
  26. @dawnieando • He kicked the bucket • I have yet

    to tick that off my bucket list • The bucket was filled with water The Meaning of The Word ‘Bucket’ Changes
  27. @dawnieando ”Ambiguity is the greatest bottleneck to computational knowledge acquisition,

    the killer problem of all natural language processing.” (Stephen Clark, formerly of Cambridge University & now a full- time research scientist with Google Deep Mind) Ambiguity Is Problematic
  28. @dawnieando • Words with a similar meaning to something else

    • Example: humorous, comical, hilarious, hysterical are ALL synonyms of funny Synonymous (Synonyms)
  29. @dawnieando Ambiguity & Polysemy • Ambiguity is at a sentence

    level • Polysemous words are arguably the most problematic due to ‘nuanced’ nature
  30. @dawnieando • Words usually with the same root and multiple

    meanings • Example: “Run” has 396 Oxford English Dictionary definitions Polysemous (Polysemy)
  31. @dawnieando • Words spelt the same but with very different

    ‘root’ of word meanings • Example: pen (writing implement), pen (pig pen) • Example: rose (stood up / ascended), rose (flower) • Example: bark (dog sound), bark (tree bark) Homonyms
  32. @dawnieando Spelt differently with VERY different meanings but sound exactly

    the same • Draft, draught • Dual, duel • Made, maid • For, fore, four • To, too, two • There, their • Where, wear, were Homophones – Difficult To Disambiguate Verbally
  33. @dawnieando Fork handles Four candles Very difficult to disambiguate in

    spoken word Worse When Words are Joined Together
  34. @dawnieando Did you want four candles or fork handles? Much

    Comedy Comes From ‘Play on Words’
  35. @dawnieando EXAMPLES • Zipfian Distribution • Firthian Linguistics • Treebanks

    • Language can be tied back to mathematical spaces & algorithms Language Has Natural Patterns & Phenomena
  36. @dawnieando Example: Zipfian Distribution (Power Law) • The frequency of

    any word in a collection is inversely proportional to its rank in the frequency table • Applies to any word frequency ANYWHERE • Image is 30 Wikipedias
  37. @dawnieando To illustrate Zipfian Distribution (Most used Words): Rank Word

    Frequency/of/Use/in/a/Corpus 1 the 2 be 1/2 3 to 1/3 4 of 1/4 5 and 1/5 6 a 1/6 7 in 1/7 8 that 1/8 9 have 1/9 10 I 1/10
  38. @dawnieando “You shall know a word by the company it

    keeps” (Firth, 1957) Firthian Linguistics One Such Phenomenon is Co-occurrence
  39. @dawnieando Words with similar meaning tend to live near each

    other in a body of text Word’s ‘nearness’ can be measured in mathematical vector spaces – a context vector is ‘word’s company’ Distributional Relatedness & Firthian Linguistics
  40. @dawnieando Co-occurrence, Similarity & Relatedness • Language models are trained

    on large bodies of text to learn ‘distributional similarity’ (co- occurrence)
  41. @dawnieando Context Vectors & Word Embeddings • And build vector

    space models for word embeddings • Models learn the weights of similarity & relatedness distances
  42. @dawnieando Context-Free Word Embeddings • Past models have been context-free

    embeddings • They lacked the ‘text-cohesion necessary to understand a word in context
  43. @dawnieando • He kicked the bucket • I have yet

    to tick that off my bucket list • The bucket was filled with water Remember ‘bucket’ Without Text Cohesion?
  44. @dawnieando Word’s Context Still Needed Gaps Filling • Past models

    used context-free embeddings • A moving ‘context window’ was used to gain word’s context
  45. @dawnieando But Even Then True Context Needs Both Sides of

    a Word • Past models were ‘uni-directional’ • The context window moved from left to right or right to left
  46. @dawnieando • BERT can see the word’s context on both

    sides of a word in a context window Bi-Directional is The B in BERT
  47. @dawnieando !Encoder Representations relates to the input and output process

    of ‘word’s context’ & embeddings What About Encoder Representations?
  48. @dawnieando !Transformer is a big deal !Derived from a 2017

    paper called ‘Attention is all you Need’ (Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I., 2017) What About The Transformer Part?
  49. @dawnieando Transformer & Attention Works out how important words are

    to each other in a given context & focuses attention
  50. @dawnieando River Bank or Financial Bank? By identifying ‘cheque’ or

    ‘deposit’ in the company of ‘bank’ BERT can disambiguate from a ‘river’ bank
  51. @dawnieando So Where is BERT’s Value in Google Search •

    Named entity determination • Textual entailment (next sentence prediction) • Coreference resolution • Question answering • Word sense disambiguation • Automatic summarization • Polysemy resolution
  52. @dawnieando BERT recognizes the word ‘to’ makes all the difference

    to the intent of the query BERT and Disambiguating Nuance
  53. @dawnieando BERT recognizes the ambiguous word ‘stand’s meaning and importance

    in the context of the query BERT and Disambiguating Nuance
  54. @dawnieando !A single word can change the whole intent of

    a query !Conversational queries particularly so !The ‘stop words’ are actually part of text-cohesion !Historically ‘stop-words’ were often ignored !The next sentence matters BERT and Intent Understanding
  55. @dawnieando Example: “I remember what my Grandad said just before

    he kicked the bucket.” Next Sentence Prediction (Textual Entailment) Often the next sentence REALLY matters
  56. @dawnieando “How far do you reckon I can kick this

    bucket?” Not What You Expected?
  57. @dawnieando • There have been lots of improvement by others

    upon BERT • Google have likely improved dramatically on BERT too • There were some issues with next-sentence prediction • Facebook built RoBERTa BERT Probably Doesn’t Resemble The Original BERT Paper
  58. @dawnieando • Named entity determination • Coreference resolution • Question

    answering • Word sense disambiguation • Automatic summarization • Polysemy resolution Featured Snippets Knowledge Graph & Web Page Extraction Together
  59. @dawnieando !BERT is multilingual from mono-lingual !Other language specific BERTs

    are being built !Transformer was trained on international translations !Language has transferrable phenomena BERT and International SEO Expect Big Things
  60. @dawnieando • Deepset – German BERT • CamemBERT – French

    BERT • AlBERTo – Italian BERT • RobBERT - Dutch RoBERTa model BERT & International SEO
  61. @dawnieando !The challenges of Pygmalion !Conversational search can now ‘scale’

    !BERT takes away some of the human labelling effort necessary !Next sentence prediction could impact assistants and clarifying questions BERT and Conversational Search Expect Big Things
  62. @dawnieando Semantic Heterogeneity Issues in Entity Oriented Search (Semantic Search)

    !Helps with anaphora & cataphora resolution (resolving pronouns of entities) !Helps with coreference resolution !Helps with named entity determination !Next sentence prediction could impact assistants and clarifying questions
  63. @dawnieando • It’s supposed to be natural • In the

    same way you can’t optimize for Rank Brain you can’t optimize for BERT • BERT is a tool / learning process in search for disambiguation & contextual understanding of words • BERT is a ‘black-box’ algorithm Why can’t you optimize for BERT?
  64. @dawnieando • Black-box algorithm • Hugging Face coined the phrase

    BERTology • Now a field of study exploring why BERT makes choices • Some concerns over bias & responsible AI Black Box Algorithms & BERTology
  65. @dawnieando !Cluster together content and interlink well on topic &

    nuance !Avoid ‘too-similar’ completing categories - merge !Consider not just the content in the page but the content in the linked pages & sections !Consider the content of the ‘whole domain’ as everything contributes in co-occurrence !Be extra vigilant when ‘pruning Utilising Co-Occurrence Strategically Employ Relatedness
  66. @dawnieando Categorisation & Subcategorisation Are King • Employ strong conceptual

    logic in your site architecture • Be careful with random blogs • If you must ‘tag’, tag thoughfully
  67. @dawnieando Anyone can build a BERT to train their own

    language processing system for a variety of natural language understanding downstream tasks. Fine-tuning can be carried out in a short time BERT represents a union of data science and SEO Anyone Can Use BERT – BERT is a Tool
  68. @dawnieando • Automatic categorization & subcategorization of content • Automatic

    generation of meta-descriptions • Automatic summarization of extracts & teasers • Categorising user-generated content / posts probably better than humans How Could BERT Be Harnessed For Efficiency in SEO? A Few Examples
  69. @dawnieando • J R Oakes - @jroakes • Hamlet Batista

    - @hamletbatista • Andrea Volpini - @cyberandy • Gefen Hermesh - @ghermesh SEOs Are Getting Busy With BERTishness
  70. @dawnieando • Original BERT was computationally expensive to run •

    ALBERT stands for A Lite BERT • Increased efficiency • ALBERT is BERT’s natural successor • ALBERT much leaner whilst providing similar results • A joint research work between Google & Toyota ALBERT – BERT’s Successor
  71. @dawnieando Reformer (Google) – Transformer’s Successor Understands word’s context from

    the perspective of a ‘whole novel’. https://venturebeat.com/2020/01/16/goog les-ai-language-model-reformer-can- process-the-entirety-of-novels/
  72. @dawnieando Growth has been huge in the natural language processing

    community – Current Superglue Leaderboard BERT Was Just The Start • Google T5 is winning • Even more advanced technology • Transfer-learning • Expect huge progress