Jamie Indigo - Trashchat’s Guide to Black Boxes: Technical SEO Tactics for LLMs

trashcats_guide_to _black_boxes Technical SEO Strategies for LLMs

audience_guide • Not a robot; speaks bot • Director of
Technical SEO at Cox Automotive • Author of Rich Snippets • Excitable nerd from New York (phones up is my cue to pause) • /in/jamie-indigo • not-a-robot.com • Proud pet parent

meet_keyleth

assumptions • Kitten • Domesticated • Lost pet (chipped) •
Innocent and affectionate

• Adult • Stray • Breed: shorthair domestic terrorist expert_eval
did do that. will do it again

new friends aren't always what they seem

geo?? aeo?? aio?? llmo?? don't care what you call it.
bots are made of code. docs pls

readme.lol shaping the battlefield is a PR and psyops tactic

How AI Search Is Changing the Way Conversions are Measured
All about robots | Google Search Central Blog

the better to see your assumptions, my dear what_big_eyes

bot_covenant let me crawl you. I'll: 1. crawl politely 2.
declare who i am 3. send you traffic

declare who i am 3. send you traffic assumptions (Fool me once, Kiki)

1. promise(polite) RFC 9309 - Robots Exclusion Protocol RFC 9309
- Robots Exclusion Protocol; All about robots | Google Search Central Blog

check(polite) AI Insights, Cloudflare Radar, 21 Aug - 18 Sep
2025

fwd: fwd: fwd: plausible deniability Common Crawl's massive dataset is
more than 9.5 petabytes large and makes up a significant portion of the training data for many Large Language Models (LLMs) of GPT-3 tokens (a representation unit of text data) stemmed from Common Crawl. 80% Mozilla Report: How Common Crawl’s Data Infrastructure Shaped the Battle Royale over Generative AI

"65% of our most expensive traffic comes from bots" wikimedia
foundation How crawlers impact the operations of the Wikimedia projects meta name = "ravenous"

Skype is retiring in May 2025, Microsoft declare_name May 2025

declare_name May 2025

let me love you Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible;
GPTBot/1.2; +https://openai.com/gptbot) Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.3; +https://openai.com/gptbot) ChatGPT%20Atlas/20251021184832000 CFNetwork/3860.100.1 Darwin/25.0.0 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.0.0 Safari/537.36 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36 Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Gemini-Deep-Research; +https://gemini[dot]google/overview/deep-research/) Chrome/135.0.0.0 Safari/537.36 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; compatible; GoogleAgent-Mariner;+https://developers.google[dot]com/search/docs/crawling-indexing/google -agent-mariner) Chrome/135.0.0.0 Safari/537.36 Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GoogleAgent-Search; +https://developers.google[dot]com/search/docs/crawling-indexing/google-agent-search) Chrome/114.0.0.0 Safari/537.36

send_traffic AI in Search is driving more queries and higher
quality clicks, Google Keyword Blog

Does LLM Traffic Convert Better Than Organic? A New Data-Backed
Study, Will Guevara but does it , liz?

"This data is in contrast to third-party reports that inaccurately
suggest dramatic declines in aggregate traffic — often based on flawed methodologies, isolated examples, or traffic changes that occurred prior to the roll out of AI features in Search."

"This data is in contrast to third-party reports that inaccurately
suggest dramatic declines in aggregate traffic — often based on flawed methodologies, isolated examples, or traffic changes that occurred prior to the roll out of AI features in Search." THEN HELP US LEARN BETTER

declare who i am 3. send you traffic do it whether you like it or not

ai_search ask ai for things, it 1. is search engine
2. does the thing I ask 3. uses search engine mechanics assumptions

search engine = information retrieval system ai search = model
trained on corpus + retrieval augmentation (sometimes)

AI "ranking" is probabilistic. Your brand has a tendency to
exist

User embedding models for personalization of sequence processing models; Google
Patents Carl • Ex-coastguard • loves cats • 1 cat, pedigree Persian Zev • Level 17 copywriter • loves cats • 1 cat, feral trash goblin user embeddings

Patents what gift should I get my cat? what gift should I get my cat? user embeddings

Patents I found an artisanal wicker cat bed Flame thrower! user embeddings

Patents How AI Mode Works and How SEO Can Prepare for the Future of Search, iPullRank you are a layer in generation • Query interpretation: what you said vs what you meant • Synthetic query generation: AKA Googling your next question before you do • Passage retrieval: re-ranking results because you two are like, totally in sync • Response synthesis: do you want your answers in text or could we interest you in a cat gif?

AI rank tracking • uses synthetic personas as part of
the prompt • these are not the same as user embedding vectors • intentionally resets persistent memory AI search • is intent aware • has ambient persistent memory • is personalized using the same user embedding vectors but ai rank tracking…

AI rank tracking is the Lighthouse of AI but trashcats
don't turn down data.

I asked ChatGPT how we should optimize the page! do_the_thing

Simple experiment* 1. Ask for information on the page 2.
Check logs for requests (*turned 3 part drama)

just_let_ai_make_your_strategy_bro

I asked ChatGPT how we should optimize the page! Flame
thrower!

nothing a I says is true it s just probable

AI Crawler Requests Renders Google (ecosystem) ✅ ✅ Claude (Claude-SearchBot,
Claude-User, Claude-Web, ClaudeBot) ✅ 🤔 OpenAI (OAI-SearchBot, ChatGPT-User, GPTBot, ChatGPT Agent) ✅ ❌ Meta (Meta-ExternalAgent) ✅ ❌ Perplexity (PerplexityBot) ✅ ❌ ByteDance (Bytespider) ✅ ❌ Common Crawl (ccbot) ✅ ❌ all powerful revolutionary tech can't render js fancy mechanics?

search console?

documentation?

trashcat isn't an insult heremeout.mp3

1. smart_people 2. log_files 3. bot_manager 4. analytics 5. interface
6. beg_borrow_build resourceful.dmg

Step 1: A little bit louder for the back of
the class Own the fact that one on out there speaks bot like you do. AI != magic JS != devil

AI Volatility tracker, Dejan AI rank volatility != panic

this is still a bot. am still tech seo hunt.exe

Step 2: Make time for bot-watching chirp ❤ ❤

1. Referring (utm=chatgpt.com, referring domain, etc) 2. User-initiated crawls (ChatGPT-User,
Claude-User, URL Content) 3. Text fragments in landing pages (#:~:text=) leverage analytics and logs as a proxy for AI rank

But should it be crawled? Should it be trained on?
These bots are lazy and dumb. If that API endpoint has everything they need, then that's where they'll send the user to. (Plus, it's expensive to keep feeding these greedy little monsters.)

Effective Resource restriction Robots.txt Introduction and Guide | Google Search
Central | Documentation 1. If you're disallowing it for an AI crawler, repeat the statement for CCbot 2. Use X-robots directives to block non-HTML resources from being indexed independently but allowed to contribute to rendered page content. 3. Seriously block your API endpoints for non rendering bots 4. Be sure to block only non-critical resources—that is, resources that aren't important to understanding the meaning of the page. 5. Block personalization resources used for returning users to conserve crawl budget 6. Hide auth required links from the crawl path

Step 3: Know what you're optimizing And what you're optimizing
for Answer Engine Citation Overlap Strategy: How to Win at AI Visibility , Josh Blyskal

AI Chatbot Market Share Worldwide }am one squishy mortal. would
read dev docs tho

Trending topic or recent event? RAG Informational meta descriptions Training
TTFB Homepage links Is Perplexity? no no yes yes Tech SEO crawlable links semantic html consistent urls not rendered, not discovered topical authority uniqueness CWV is Google?

want new pages crawled? link from homepage “So for the
most part, for example, we would refresh crawl the homepage, I don’t know, once a day, or every couple of hours, or something like that. And if we find new links on their home page then we’ll go off and crawl those with the discovery crawl as well. And because of that you will always see a mix of discover and refresh happening with regard to crawling. And you’ll see some baseline of crawling happening every day. But if we recognize that individual pages change very rarely, then we realize we don’t have to crawl them all the time.” English Google SEO office-hours from January 7, 2022

“The decision on which pages to crawl is primarily influenced
by the relevance of the title, the content within the snippet, the freshness of the information, and the credibility of the domain.” ChatGPT support team How does ChatGPT Search select the sources to crawl? Jérôme Salomon

snippet_peekaboo

1. URL 2. Title 3. Snippet (usually meta description) 4.
Ranking position 5. Metadata event: delta data: {"v": [ { "type": "search_result_group", "domain": "www.kbb.com", "entries": [ { "type": "search_result", "url":"https://www.kbb.com/cars-for-sale/all /2025/nissan/frontier/pro-4x", "title": "2025 Nissan Frontier PRO-4X for Sale", "snippet": "... 3572 Nissan Frontier cars for sale, including a New 2025 Nissan Frontier PRO-4X and a Used 2025 Nissan Frontier PRO-4X ranging in price from $13795 to $68619.", "ref_id": null, "pub_date": null, "attribution": "www.kbb.com" } ] } ] }

John Mueller, LinkedIn meta descriptions are hot again

Technical SEO for AI Search - SALT.agency® performance still matters
• Sites with CLS ≤ 0.1 recorded a 29.8% higher inclusion rate in generative summaries compared with sites above this threshold. • Pages delivering LCP ≤ 2.5 seconds were 1.47 times more likely to appear in AI outputs than slower pages. • Crawlers abandoned requests for 18% of pages larger than 1 MB of HTML, highlighting the need for lean markup. • TTFB under 200 ms correlated with a 22% increase in citation density, particularly when paired with robust caching strategies. • His study shows that performance improvements do more than enhance user experience. They directly increase the probability of being cited or surfaced by AI systems.

HTTP 499 status code indicates that the client closed the
connection before the server could respond. In this context, the client is a bot sent by ChatGPT, terminating the request due to delayed server responses. Real-time genAI search can't afford slow pages ChatGPT has no time for your slow pages ⏱ | Jérôme Salomon 499 = i give up

Step 4: Be f%^ing feral Learn in public. Share what
you find. Use reproducible methodologies. The way out is through and the way through is together. Answer Engine Citation Overlap Strategy: How to Win at AI Visibility , Josh Blyskal

.you_made_me_do_this turning to hackers for ai transparency

https://not-a-robot.com/ai-transparency Each doc has: 1. Assertions (Rules, Constraints, and Stated
Facts) 2. Functionalities (The AI's Capabilities) based on leaked system prompts. System prompts are the hidden instructions that serve as the baseline for AI interactions that dictates the model's tone, expertise, and constraints. They define AI's overall behavior and role. These are different from user prompts provide specific instructions or questions for a particular task or interaction. System prompts are designed to take priority over user prompts to ensure the AI's core behavior remains consistent.

i really just want tech doc pls

thank you • /in/jamie-indigo • not-a-robot.com

Jamie Indigo - Trashchat’s Guide to Black Boxes...

Jamie Indigo - Trashchat’s Guide to Black Boxes: Technical SEO Tactics for LLMs

More Decks by Tech SEO Connect

Other Decks in Marketing & SEO

Featured

Transcript