Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Image search

Image search

Palestra apresentada por Alex Salgado na 76º Python Floripa https://www.youtube.com/watch?v=vug0ZQ1kj2o&t=1s

Python Floripa

June 17, 2024
Tweet

More Decks by Python Floripa

Other Decks in Technology

Transcript

  1. Alex Salgado Senior Developer Advocate LATAM • Mestre em Ciência

    da Computação pela UFF (Games) • MBA UFF • PhD Candidate UFF: Robótica/Visão Computacional - + 25 anos de experiência na área de desenvolvimento de software - Ocupei diversos cargos, trabalhando em startups, pequenas e grandes empresas como Oracle, CSN, BRQ/IBM, Chemtech/Siemens (9 anos). - 8 anos como professor universitário @alexsalgadoprof salgado @alexsalgadoprof /in/alex-salgado/
  2. 80% Dados mundiais são não-estruturados Preocupações em torno da IA

    Generativa. KPMG Generative AI Survey The Prompt: Generative AI survey | Google Cloud Blog
  3. Enterprise Search Security Observability Kibana Elasticsearch Three solutions powered by

    one stack Powered by the Elastic Stack 3 solutions Deployed anywhere Elastic Cloud Elastic Cloud on Kubernetes Elastic Cloud Enterprise Saas Orchestration Logstash Beats Agent
  4. ML / AI IA Generativa O que? Casos de uso

    Algoritmos programados para aprender o comportamento dos dados e fazer previsões Algoritmos (Deep Learning) treinados com grandes volumes de dados e programados para criar novos dados Large Language Model Conceitos básicos de ML, IA Generativa e LLMs Detecção de anomalias, forecasting, reconhecimento de imagem, PLN Algoritmos programados para criar novos dados Chatbots, geradores de texto, imagem e música Chatbots, geradores de texto, tradutores, geradores de código, aplicativos de pergunta e resposta
  5. O que é similaridade de vetores? Converta dados em representações

    vetoriais onde as distâncias representam similaridade. Natural Language Processing Model Text Convolutional Neural Network Image Embeddings Feature vectors a 1 a 2 … a n a 1 a 2 … a n 0.0167327… 0.3458967… 0.0547893… 0.0324981… 0.0135497… 0.0216549…
  6. Similar data is grouped together CARTOON REALISTIC HUMAN MACHINE Character

    Vector [ -1.0, 1.0 ] [ 1.0, -0.1 ] [ -1.0, 0.8 ]
  7. REALISTIC QUERY Relevance Result Query 1 2 3 4 5

    Vector search ranks objects by similarity (relevance) to the query CARTOON REALISTIC HUMAN MACHINE
  8. Image Search Architecture Generate embeddings outside Elasticsearch Your Image To

    Search Application Elastic Platform Elasticsearch Index kNN Search Inference API DogService Util DogRepository Dog Embedding Search results Search results Search Query Insert embeddings
  9. { "_id":"product-1234", "product_name":"Summer Dress", "description":"Our best-selling…", "Price": 118, "color":"blue", "fabric":"cotton",

    "desc_embedding":[0.452,0.3242,…] } 3 Documents stored in Elasticsearch 2 Source data Search-powered application POST /_doc GET /_search Transformer model 1 with kNN clause 2
  10. Step 1: Setting up the machine learning model $ eland_import_hub_model

    --url https:-/cluster_URL --hub-model-id BERT-MiniLM-L6 --task-type text_embedding --start BERT-MiniLM-L6 Select the appropriate model Load the model to the cluster Manage models
  11. Step 2: Data ingestion and embedding generation { "_id":"product-1234", "product_name":"Summer

    Dress", "description":"Our best-selling…", "Price": 118, "color":"blue", "fabric":"cotton", "desc_embedding":[0.452,0.3242,…] } Standard field indexing for non-vector types POST /_doc POST /_doc Encoding via Inference Processor Source data { "_id":"product-1234", "product_name":"Summer Dress", "description":"Our best-selling…", "Price": 118, "color":"blue", "fabric":"cotton", }
  12. Step 3: Issuing a vector query GET product-catalog/_search { "query":

    { "match": { "description": { "query": "summer clothes", "boost": 0.9 } } }, "knn": { "field": "desc_embbeding", "query_vector": [0.123, 0.244,...], "k": 5, "num_candidates": 50, "boost": 0.1, "filter": { "term": { "department": "women" } } }, "size": 10 } Issue query using the _search endpoint, with a kNN clause, using the previously generated embedding Query is submitted to the search-powered application Transformer model POST /_ml/trained_models/my-model/_infer { "docs": { "description": "summer clothes" } } Query embedding is generated c 3 b a