Google BERT - SMX London - What SEOs Need to Know

@dawnieando Google BERT What SEOs and Marketers Need to Know

@dawnieando Here’s another Bert, & I am Dawn Anderson Managing
Director of Bertey Bert, my Pomeranian

@dawnieando • A Google algorithmic update • Google announce BERT
to the organic search world in a VERY geeky way • Mentions of the 15% of new queries every day • Touches on ‘The Vocabulary Problem’ (many ways of querying the same thing) October 2019 - Welcome To Search, BERT

@dawnieando • Probably the biggest improvement in search EVER •
The biggest change in search in five years, since RankBrain Fundamentally… Google BERT is

@dawnieando !Layman’s Terms: it can be used to help Google
better understand the context of words in search queries & content So, just what is Google BERT update?

@dawnieando • Used globally in all languages on featured snippets
• BERT to impact rankings for 1 in 10 queries • Initially for English language queries in US The bottom line search announcement

@dawnieando Dec 2019 – BERT expands internationally • Over 70
languages • Still only impacts 10% of queries despite the considerable expansion • Still all featured snippets globally

@dawnieando • BERT deals with ambiguity & ‘nuance’ in queries
& content • Unlikely to impact short queries • More likely to impact conversational queries • Unlikely to impact branded queries Why just 10% of Google Queries Impacted?

@dawnieando • The SEO community is abuzz • BERT is
a big deal • Likened to ‘Rank Brain’ in some of the ‘interesting’ interpretations • Some confusions around ‘What BERT is and what it means for search’ SEO’s React

@dawnieando !A neural network-based technique for natural language processing pre-training
!An anagram of Bi-Directional Encoder Representations from Transformers BERT in Geek Speak

@dawnieando !Bi-directional !Encoder !Representations From !Transformers Let’s Visit The B
– E – R – T Explanations Later

@dawnieando • Search algorithm update • Open source pre-trained model
/ framework for natural language understanding • Academic research paper • Evolving tool for computational linguistics efficiency • Beginning of MANY BERT’ish language models Important: BERT is Many Things

@dawnieando So What’s The Backstory? Where%did%BERT%come%from? Where%did%the%need%for%BERT%arise? The%Impact%of%BERT%for%SEO%&%beyond? What%next?

@dawnieando • Academic Paper • Research Project by Devlin et
al • Published a year before the update in October 2018 • Bert: Pre-training of deep bidirectional transformers for language understanding BERT started as a research paper in 2018

@dawnieando • Open sourced so anyone can build a BERT
• BERT created a sea-change leap-forward in natural language understanding in information retrieval very quickly • Provided a pre-trained language model which required only fine- tuning BERT Open Sourced in 2018

@dawnieando The whole of the English Wikipedia & The Books
Corpus combined. Over 2,500 million words BERT Has Been Pre-Trained On Many Words

@dawnieando Vanilla BERT provides a pre- trained starting point layer
for neural networks in machine learning & natural language diverse tasks The machine learning community got very excited about BERT

@dawnieando • BERT is fine-tuned on a variety of downstream
NLP tasks, including question and answer datasets BERT Can Be Fine-Tuned in A Short Space of Time

@dawnieando • Vanilla BERT can be used ‘out of the
box’ or fine-tuned • Provides a great starting point & saves huge amounts of time & money • Those wishing to, ‘can build upon’, and improve BERT BERT Saves Researchers Time AND Money

@dawnieando • Microsoft – MT-DNN • Facebook – RoBERTa •
XLNet • ERNIE – Baidu • Lots of other contenders Since 2018 Major tech companies extend BERT

@dawnieando Training Datasets Like MSMARCO Are Used To Fine Tune
With Question & Answer Datasets

@dawnieando Real Bing Questions Feed MSMARCO Microsoft Machine Reading Comprehension
Dataset. Real Bing User Queries for NLU Research

@dawnieando You think SEOs are competitive? ML Engineers are more
so • GLUE • SuperGLUE • MSMARCO • SQuAD …And Leaderboards

@dawnieando SuperGLUE was created because GLUE got too easy Progress
was phenomenal with many new SOTAs

@dawnieando Language models like BERT help machines understand the nuance
in word’s context and surrounding text cohesion What Purpose Does BERT Serve & How?

@dawnieando • Dates back over 60 years old to the
Turing Test paper • Aims at understanding the way words fit together with structure and meaning. • NLU is Connected to the field of linguistics (computational linguistics) • Over time, increasingly computational linguistics overflows to a growing online web of content What is Natural Language Understanding?

@dawnieando • Natural language understanding requires: • Word’s context •
Common sense reasoning Natural Language Recognition is NOT Understanding

@dawnieando Humans mostly understand nuance and jargon from multiple meanings
in written and spoken word because of ‘context’ Humans ‘Naturally’ Understand Context

@dawnieando • Synonymous • Polysemous • Homonymous But Words Can
Be VERY Problematic for Machines & Sometimes Even for Humans

@dawnieando “The meaning of a word is its use in
a language” (Ludwig, Wittgenstein, Philosopher, 1953) Image attribution: Mortiz, Nahr (Public domain) Single Words Have No Meaning

@dawnieando The word ‘like’ in this sentence, is both a:
!(VBP) : (‘verb’ (non 3rd-person, singular, present) ) !(IN) : (Preposition or subordinating conjunction) An Example of Word’s Meaning Changing • I -> PRP • Like -> VBP • That -> IN • He -> PRP • Is -> VBZ • Like -> IN • That -> DT

@dawnieando Linguists Tag ‘Parts of Speech’

@dawnieando E.g. Verbs, nouns, adjectives • Penn-treebank tagger -> 36
different parts of speech • CLAWS7 (C7) -> 146 different parts of speech • Brown Corpus Tagger -> 81 different parts of speech Words Are ‘Part of Speech’ When Combined

@dawnieando • He kicked the bucket • I have yet
to tick that off my bucket list • The bucket was filled with water The Meaning of The Word ‘Bucket’ Changes

@dawnieando Words Need ’Text Cohesion’ The$‘Glue’$which$adds$meaning May$historically$be$‘stop$words’ Surrounding$words$can$change$‘intent’ They$add$‘context’

@dawnieando ”Ambiguity is the greatest bottleneck to computational knowledge acquisition,
the killer problem of all natural language processing.” (Stephen Clark, formerly of Cambridge University & now a full- time research scientist with Google Deep Mind) Ambiguity Is Problematic

@dawnieando • Words with a similar meaning to something else
• Example: humorous, comical, hilarious, hysterical are ALL synonyms of funny Synonymous (Synonyms)

@dawnieando Ambiguity & Polysemy • Ambiguity is at a sentence
level • Polysemous words are arguably the most problematic due to ‘nuanced’ nature

@dawnieando • Words usually with the same root and multiple
meanings • Example: “Run” has 396 Oxford English Dictionary definitions Polysemous (Polysemy)

@dawnieando •Over%40%%of%English%words%are% polysemous%(McCarthy,%1997;% Durkin%&%Manning,%1989)

@dawnieando • Words spelt the same but with very different
‘root’ of word meanings • Example: pen (writing implement), pen (pig pen) • Example: rose (stood up / ascended), rose (flower) • Example: bark (dog sound), bark (tree bark) Homonyms

@dawnieando Spelt differently with VERY different meanings but sound exactly
the same • Draft, draught • Dual, duel • Made, maid • For, fore, four • To, too, two • There, their • Where, wear, were Homophones – Difficult To Disambiguate Verbally

@dawnieando Fork handles Four candles Very difficult to disambiguate in
spoken word Worse When Words are Joined Together

@dawnieando Did you want four candles or fork handles? Much
Comedy Comes From ‘Play on Words’

@dawnieando Which Does Not Bode Well For Voice Search

@dawnieando EXAMPLES • Zipfian Distribution • Firthian Linguistics • Treebanks
• Language can be tied back to mathematical spaces & algorithms Language Has Natural Patterns & Phenomena

@dawnieando Example: Zipfian Distribution (Power Law) • The frequency of
any word in a collection is inversely proportional to its rank in the frequency table • Applies to any word frequency ANYWHERE • Image is 30 Wikipedias

@dawnieando To illustrate Zipfian Distribution (Most used Words): Rank Word
Frequency/of/Use/in/a/Corpus 1 the 2 be 1/2 3 to 1/3 4 of 1/4 5 and 1/5 6 a 1/6 7 in 1/7 8 that 1/8 9 have 1/9 10 I 1/10

@dawnieando “You shall know a word by the company it
keeps” (Firth, 1957) Firthian Linguistics One Such Phenomenon is Co-occurrence

@dawnieando Words with similar meaning tend to live near each
other in a body of text Word’s ‘nearness’ can be measured in mathematical vector spaces – a context vector is ‘word’s company’ Distributional Relatedness & Firthian Linguistics

@dawnieando Co-occurrence, Similarity & Relatedness • Language models are trained
on large bodies of text to learn ‘distributional similarity’ (co- occurrence)

@dawnieando Context Vectors & Word Embeddings • And build vector
space models for word embeddings • Models learn the weights of similarity & relatedness distances

@dawnieando Context-Free Word Embeddings • Past models have been context-free
embeddings • They lacked the ‘text-cohesion necessary to understand a word in context

@dawnieando • He kicked the bucket • I have yet
to tick that off my bucket list • The bucket was filled with water Remember ‘bucket’ Without Text Cohesion?

@dawnieando Word’s Context Still Needed Gaps Filling • Past models
used context-free embeddings • A moving ‘context window’ was used to gain word’s context

@dawnieando But Even Then True Context Needs Both Sides of
a Word • Past models were ‘uni-directional’ • The context window moved from left to right or right to left

@dawnieando They Didn’t Look At Words On Either Side Simultaneously

@dawnieando !Bi-directional !Encoder !Representations From !Transformers So What About That
B – E – R – T explanation?

@dawnieando • BERT can see the word’s context on both
sides of a word in a context window Bi-Directional is The B in BERT

@dawnieando !Encoder Representations relates to the input and output process
of ‘word’s context’ & embeddings What About Encoder Representations?

@dawnieando !Transformer is a big deal !Derived from a 2017
paper called ‘Attention is all you Need’ (Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I., 2017) What About The Transformer Part?

@dawnieando Transformer & Attention Works out how important words are
to each other in a given context & focuses attention

@dawnieando !Bi-directional !Encoder !Representations From !Transformers This Technology Provides Word’s
Context

@dawnieando River Bank or Financial Bank? By identifying ‘cheque’ or
‘deposit’ in the company of ‘bank’ BERT can disambiguate from a ‘river’ bank

@dawnieando So Where is BERT’s Value in Google Search •
Named entity determination • Textual entailment (next sentence prediction) • Coreference resolution • Question answering • Word sense disambiguation • Automatic summarization • Polysemy resolution

@dawnieando BERT recognizes the word ‘to’ makes all the difference
to the intent of the query BERT and Disambiguating Nuance

@dawnieando BERT recognizes the ambiguous word ‘stand’s meaning and importance
in the context of the query BERT and Disambiguating Nuance

@dawnieando !A single word can change the whole intent of
a query !Conversational queries particularly so !The ‘stop words’ are actually part of text-cohesion !Historically ‘stop-words’ were often ignored !The next sentence matters BERT and Intent Understanding

@dawnieando Example: “I remember what my Grandad said just before
he kicked the bucket.” Next Sentence Prediction (Textual Entailment) Often the next sentence REALLY matters

@dawnieando “How far do you reckon I can kick this
bucket?” Not What You Expected?

@dawnieando • There have been lots of improvement by others
upon BERT • Google have likely improved dramatically on BERT too • There were some issues with next-sentence prediction • Facebook built RoBERTa BERT Probably Doesn’t Resemble The Original BERT Paper

@dawnieando • Named entity determination • Coreference resolution • Question
answering • Word sense disambiguation • Automatic summarization • Polysemy resolution Featured Snippets Knowledge Graph & Web Page Extraction Together

@dawnieando !BERT is multilingual from mono-lingual !Other language specific BERTs
are being built !Transformer was trained on international translations !Language has transferrable phenomena BERT and International SEO Expect Big Things

@dawnieando • Deepset – German BERT • CamemBERT – French
BERT • AlBERTo – Italian BERT • RobBERT - Dutch RoBERTa model BERT & International SEO

@dawnieando !The challenges of Pygmalion !Conversational search can now ‘scale’
!BERT takes away some of the human labelling effort necessary !Next sentence prediction could impact assistants and clarifying questions BERT and Conversational Search Expect Big Things

@dawnieando Semantic Heterogeneity Issues in Entity Oriented Search (Semantic Search)
!Helps with anaphora & cataphora resolution (resolving pronouns of entities) !Helps with coreference resolution !Helps with named entity determination !Next sentence prediction could impact assistants and clarifying questions

@dawnieando Bing has been BERTing since April 2019 • Impacts
ALL Bing queries globally

@dawnieando • It’s supposed to be natural • In the
same way you can’t optimize for Rank Brain you can’t optimize for BERT • BERT is a tool / learning process in search for disambiguation & contextual understanding of words • BERT is a ‘black-box’ algorithm Why can’t you optimize for BERT?

@dawnieando • Black-box algorithm • Hugging Face coined the phrase
BERTology • Now a field of study exploring why BERT makes choices • Some concerns over bias & responsible AI Black Box Algorithms & BERTology

@dawnieando !Cluster together content and interlink well on topic &
nuance !Avoid ‘too-similar’ completing categories - merge !Consider not just the content in the page but the content in the linked pages & sections !Consider the content of the ‘whole domain’ as everything contributes in co-occurrence !Be extra vigilant when ‘pruning Utilising Co-Occurrence Strategically Employ Relatedness

@dawnieando Categorisation & Subcategorisation Are King • Employ strong conceptual
logic in your site architecture • Be careful with random blogs • If you must ‘tag’, tag thoughfully

@dawnieando Anyone can build a BERT to train their own
language processing system for a variety of natural language understanding downstream tasks. Fine-tuning can be carried out in a short time BERT represents a union of data science and SEO Anyone Can Use BERT – BERT is a Tool

@dawnieando • Automatic categorization & subcategorization of content • Automatic
generation of meta-descriptions • Automatic summarization of extracts & teasers • Categorising user-generated content / posts probably better than humans How Could BERT Be Harnessed For Efficiency in SEO? A Few Examples

@dawnieando • J R Oakes - @jroakes • Hamlet Batista
- @hamletbatista • Andrea Volpini - @cyberandy • Gefen Hermesh - @ghermesh SEOs Are Getting Busy With BERTishness

@dawnieando Efficiency is Also A Focus DistilBERT (Hugging/Face) ALBERT/(Google) Fast/BERT

@dawnieando • Original BERT was computationally expensive to run •
ALBERT stands for A Lite BERT • Increased efficiency • ALBERT is BERT’s natural successor • ALBERT much leaner whilst providing similar results • A joint research work between Google & Toyota ALBERT – BERT’s Successor

@dawnieando ALBERT is EVERYWHERE

@dawnieando Reformer (Google) – Transformer’s Successor Understands word’s context from
the perspective of a ‘whole novel’. https://venturebeat.com/2020/01/16/goog les-ai-language-model-reformer-can- process-the-entirety-of-novels/

@dawnieando Growth has been huge in the natural language processing
community – Current Superglue Leaderboard BERT Was Just The Start • Google T5 is winning • Even more advanced technology • Transfer-learning • Expect huge progress

@dawnieando SEE YOU AT THE NEXT SMX!

Google BERT - SMX London - What SEOs Need to Know

Google BERT - SMX London - What SEOs Need to Know

More Decks by Dawn Anderson

Other Decks in Marketing & SEO

Featured

Transcript