Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

Jamie Indigo - Trashchat’s Guide to Black Boxes...

Jamie Indigo - Trashchat’s Guide to Black Boxes: Technical SEO Tactics for LLMs

Avatar for Tech SEO Connect

Tech SEO Connect PRO

December 12, 2025
Tweet

More Decks by Tech SEO Connect

Other Decks in Marketing & SEO

Transcript

  1. audience_guide • Not a robot; speaks bot • Director of

    Technical SEO at Cox Automotive • Author of Rich Snippets • Excitable nerd from New York (phones up is my cue to pause) • /in/jamie-indigo • not-a-robot.com • Proud pet parent
  2. How AI Search Is Changing the Way Conversions are Measured

    All about robots | Google Search Central Blog
  3. bot_covenant let me crawl you. I'll: 1. crawl politely 2.

    declare who i am 3. send you traffic
  4. bot_covenant let me crawl you. I'll: 1. crawl politely 2.

    declare who i am 3. send you traffic assumptions (Fool me once, Kiki)
  5. 1. promise(polite) RFC 9309 - Robots Exclusion Protocol RFC 9309

    - Robots Exclusion Protocol; All about robots | Google Search Central Blog
  6. fwd: fwd: fwd: plausible deniability Common Crawl's massive dataset is

    more than 9.5 petabytes large and makes up a significant portion of the training data for many Large Language Models (LLMs) of GPT-3 tokens (a representation unit of text data) stemmed from Common Crawl. 80% Mozilla Report: How Common Crawl’s Data Infrastructure Shaped the Battle Royale over Generative AI
  7. "65% of our most expensive traffic comes from bots" wikimedia

    foundation How crawlers impact the operations of the Wikimedia projects meta name = "ravenous"
  8. let me love you Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible;

    GPTBot/1.2; +https://openai.com/gptbot) Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.3; +https://openai.com/gptbot) ChatGPT%20Atlas/20251021184832000 CFNetwork/3860.100.1 Darwin/25.0.0 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.0.0 Safari/537.36 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36 Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Gemini-Deep-Research; +https://gemini[dot]google/overview/deep-research/) Chrome/135.0.0.0 Safari/537.36 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; compatible; GoogleAgent-Mariner;+https://developers.google[dot]com/search/docs/crawling-indexing/google -agent-mariner) Chrome/135.0.0.0 Safari/537.36 Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GoogleAgent-Search; +https://developers.google[dot]com/search/docs/crawling-indexing/google-agent-search) Chrome/114.0.0.0 Safari/537.36
  9. "This data is in contrast to third-party reports that inaccurately

    suggest dramatic declines in aggregate traffic — often based on flawed methodologies, isolated examples, or traffic changes that occurred prior to the roll out of AI features in Search."
  10. "This data is in contrast to third-party reports that inaccurately

    suggest dramatic declines in aggregate traffic — often based on flawed methodologies, isolated examples, or traffic changes that occurred prior to the roll out of AI features in Search." THEN HELP US LEARN BETTER
  11. bot_covenant let me crawl you. I'll: 1. crawl politely 2.

    declare who i am 3. send you traffic do it whether you like it or not
  12. ai_search ask ai for things, it 1. is search engine

    2. does the thing I ask 3. uses search engine mechanics assumptions
  13. search engine = information retrieval system ai search = model

    trained on corpus + retrieval augmentation (sometimes)
  14. User embedding models for personalization of sequence processing models; Google

    Patents Carl • Ex-coastguard • loves cats • 1 cat, pedigree Persian Zev • Level 17 copywriter • loves cats • 1 cat, feral trash goblin user embeddings
  15. User embedding models for personalization of sequence processing models; Google

    Patents what gift should I get my cat? what gift should I get my cat? user embeddings
  16. User embedding models for personalization of sequence processing models; Google

    Patents I found an artisanal wicker cat bed Flame thrower! user embeddings
  17. User embedding models for personalization of sequence processing models; Google

    Patents How AI Mode Works and How SEO Can Prepare for the Future of Search, iPullRank you are a layer in generation • Query interpretation: what you said vs what you meant • Synthetic query generation: AKA Googling your next question before you do • Passage retrieval: re-ranking results because you two are like, totally in sync • Response synthesis: do you want your answers in text or could we interest you in a cat gif?
  18. AI rank tracking • uses synthetic personas as part of

    the prompt • these are not the same as user embedding vectors • intentionally resets persistent memory AI search • is intent aware • has ambient persistent memory • is personalized using the same user embedding vectors but ai rank tracking…
  19. Simple experiment* 1. Ask for information on the page 2.

    Check logs for requests (*turned 3 part drama)
  20. AI Crawler Requests Renders Google (ecosystem) ✅ ✅ Claude (Claude-SearchBot,

    Claude-User, Claude-Web, ClaudeBot) ✅ 🤔 OpenAI (OAI-SearchBot, ChatGPT-User, GPTBot, ChatGPT Agent) ✅ ❌ Meta (Meta-ExternalAgent) ✅ ❌ Perplexity (PerplexityBot) ✅ ❌ ByteDance (Bytespider) ✅ ❌ Common Crawl (ccbot) ✅ ❌ all powerful revolutionary tech can't render js fancy mechanics?
  21. Step 1: A little bit louder for the back of

    the class Own the fact that one on out there speaks bot like you do. AI != magic JS != devil
  22. 1. Referring (utm=chatgpt.com, referring domain, etc) 2. User-initiated crawls (ChatGPT-User,

    Claude-User, URL Content) 3. Text fragments in landing pages (#:~:text=) leverage analytics and logs as a proxy for AI rank
  23. But should it be crawled? Should it be trained on?

    These bots are lazy and dumb. If that API endpoint has everything they need, then that's where they'll send the user to. (Plus, it's expensive to keep feeding these greedy little monsters.)
  24. Effective Resource restriction Robots.txt Introduction and Guide | Google Search

    Central | Documentation 1. If you're disallowing it for an AI crawler, repeat the statement for CCbot 2. Use X-robots directives to block non-HTML resources from being indexed independently but allowed to contribute to rendered page content. 3. Seriously block your API endpoints for non rendering bots 4. Be sure to block only non-critical resources—that is, resources that aren't important to understanding the meaning of the page. 5. Block personalization resources used for returning users to conserve crawl budget 6. Hide auth required links from the crawl path
  25. Step 3: Know what you're optimizing And what you're optimizing

    for Answer Engine Citation Overlap Strategy: How to Win at AI Visibility , Josh Blyskal
  26. Trending topic or recent event? RAG Informational meta descriptions Training

    TTFB Homepage links Is Perplexity? no no yes yes Tech SEO crawlable links semantic html consistent urls not rendered, not discovered topical authority uniqueness CWV is Google?
  27. want new pages crawled? link from homepage “So for the

    most part, for example, we would refresh crawl the homepage, I don’t know, once a day, or every couple of hours, or something like that. And if we find new links on their home page then we’ll go off and crawl those with the discovery crawl as well. And because of that you will always see a mix of discover and refresh happening with regard to crawling. And you’ll see some baseline of crawling happening every day. But if we recognize that individual pages change very rarely, then we realize we don’t have to crawl them all the time.” English Google SEO office-hours from January 7, 2022
  28. “The decision on which pages to crawl is primarily influenced

    by the relevance of the title, the content within the snippet, the freshness of the information, and the credibility of the domain.” ChatGPT support team How does ChatGPT Search select the sources to crawl? Jérôme Salomon
  29. 1. URL 2. Title 3. Snippet (usually meta description) 4.

    Ranking position 5. Metadata event: delta data: {"v": [ { "type": "search_result_group", "domain": "www.kbb.com", "entries": [ { "type": "search_result", "url":"https://www.kbb.com/cars-for-sale/all /2025/nissan/frontier/pro-4x", "title": "2025 Nissan Frontier PRO-4X for Sale", "snippet": "... 3572 Nissan Frontier cars for sale, including a New 2025 Nissan Frontier PRO-4X and a Used 2025 Nissan Frontier PRO-4X ranging in price from $13795 to $68619.", "ref_id": null, "pub_date": null, "attribution": "www.kbb.com" } ] } ] }
  30. Technical SEO for AI Search - SALT.agency® performance still matters

    • Sites with CLS ≤ 0.1 recorded a 29.8% higher inclusion rate in generative summaries compared with sites above this threshold. • Pages delivering LCP ≤ 2.5 seconds were 1.47 times more likely to appear in AI outputs than slower pages. • Crawlers abandoned requests for 18% of pages larger than 1 MB of HTML, highlighting the need for lean markup. • TTFB under 200 ms correlated with a 22% increase in citation density, particularly when paired with robust caching strategies. • His study shows that performance improvements do more than enhance user experience. They directly increase the probability of being cited or surfaced by AI systems.
  31. HTTP 499 status code indicates that the client closed the

    connection before the server could respond. In this context, the client is a bot sent by ChatGPT, terminating the request due to delayed server responses. Real-time genAI search can't afford slow pages ChatGPT has no time for your slow pages ⏱ | Jérôme Salomon 499 = i give up
  32. Step 4: Be f%^ing feral Learn in public. Share what

    you find. Use reproducible methodologies. The way out is through and the way through is together. Answer Engine Citation Overlap Strategy: How to Win at AI Visibility , Josh Blyskal
  33. https://not-a-robot.com/ai-transparency Each doc has: 1. Assertions (Rules, Constraints, and Stated

    Facts) 2. Functionalities (The AI's Capabilities) based on leaked system prompts. System prompts are the hidden instructions that serve as the baseline for AI interactions that dictates the model's tone, expertise, and constraints. They define AI's overall behavior and role. These are different from user prompts provide specific instructions or questions for a particular task or interaction. System prompts are designed to take priority over user prompts to ensure the AI's core behavior remains consistent.