Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[CVPR 2020 Tutorial] A Large-Scale Visual Searc...

[CVPR 2020 Tutorial] A Large-Scale Visual Search System in the C2C Marketplace App Mercari

CVPR 2020 Tutorial - Image Retrieval in the Wild

This presentation introduces how we integrated the visual search into the C2C marketplace app, Mercari, which has 1.5+B listings and 16+M monthly active users.

Takuma Yamaguchi

June 19, 2020
Tweet

More Decks by Takuma Yamaguchi

Other Decks in Technology

Transcript

  1. A Large-Scale Visual Search System in the C2C Marketplace App

    Mercari Takuma Yamaguchi (Mercari, Inc.) CVPR 2020 Tutorial - Image Retrieval in the Wild 2020.06.19
  2. (c) 2020 Mercari, Inc. Contents • Visual Search Applications /

    Services • Scalable Visual Search System • Image Representation for a Consumer-to-Consumer Marketplace • Other Applications 2
  3. (c) 2020 Mercari, Inc. Image Search or Visual Search? 3

    Image Retrieval / Image Search Search by metadata, keywords, tags and images Visual Search Reverse Image Search Content-based Image Retrieval (CBIR) Search by images
  4. (c) 2020 Mercari, Inc. Visual Search Applications / Services Visual

    Search Applications • Google Images • Bing Images • Pinterest Visual Search • Yandex Images • TinEye • eCommerce apps • Retailers apps • etc... 5 Visual Search Services • Google Cloud Vision Product Search • Bing Image Search • TinEye API • Alibaba Cloud Image Search • Visenze Visual Search • Syte Camera Search • etc… Visual search may bring better user experience and better discoverability of contents and items
  5. (c) 2020 Mercari, Inc. Visual Search for eCommerce • Amazon

    • eBay • Asos • Alibaba • Taobao • AliExpress • Mercari • etc... 6 The visual search feature on the Mercari Japan app [Promotional Video]
  6. (c) 2020 Mercari, Inc. What is Mercari? 7 16+ Million


    Monthly active users
 The Mercari app is a C2C marketplace where individuals can easily sell used items 1.5+ Billion
 Total number of items
 The system should be highly scalable
  7. (c) 2020 Mercari, Inc. Why Visual Search? 8 How do

    you query when you look for clothes like this? • “Plaid” and “Shirt”? • “Plaid” and “Blouse”? • “Checks” and “Blouse”? • “Checks” and “Blouse” and “Frill”? • “Gingham Plaid” and “Blouse” and “Frill”? • “Gingham Plaid” and “Blouse” and “Frill” and “Collar”? • ... If you find what you want, you are lucky If not, you just get tired...
  8. (c) 2020 Mercari, Inc. Why Visual Search? 9 In a

    consumer-to-consumer (C2C) marketplace, enough information for each item is not always provided. Even if buyers know how to describe what they want properly, items without enough information may not be found by text-based search. This item cannot be discovered by “gingham plaid blouse”, even though the query terms are correct black and white plaid shirt
  9. (c) 2020 Mercari, Inc. Visual Search vs. Text-based Search on

    the Mercari App 10 Query: “black white plaid shirt” Query: Text-based Search Visual Search (*) The visual search is available only on the Mercari Japan iOS app as of now Query: “Gingham Plaid Frill Blouse”
  10. (c) 2020 Mercari, Inc. Visual Search Surpasses Text-based Search? 11

    Query: Text-based Search Visual Search (*) The visual search is available only on the Mercari Japan iOS app as of now Query: “iphone xs max 256gb” (iPhone XS Max 256GB) iPhone X iPhone XS iPhone 6s / 7 iPhone XS Max 256GB Text-based search is better when: • sellers and buyers describe products in the same way • visual information is not enough to explain products
  11. (c) 2020 Mercari, Inc. Visual Search Statistics 12 • By

    2021, retailers that support visual and voice search will increase their e-commerce revenue by 30% (Gartner) ◦ https://www.gartner.com/smarterwithgartner/gartner-top-strategic-predictions-for-2018-and-beyond/ • 62% of millenials want a visual search to discover products on their mobile devices (ViSenze) ◦ https://www.fastgrowthbrands.com/2018/08/retailers-must-optimise-omnichannel-mobile-for-millennials/ • 70+% of online shoppers in the UK in the under 25-year-old group have used or plan to use a visual search tool (GlobalData) ◦ https://www.retail-insight-network.com/comment/rise-of-visual-search-2019/ • 36% of consumers have performed or used visual search (Intent Lab) ◦ https://www.businesswire.com/news/home/20190204005613/en/Visual-Search-Wins-Text-Consumers'-Trusted-Information
  12. (c) 2020 Mercari, Inc. Visual Search Processing Flow 14 Search

    Indexing Image Database Feature Vector Storage Similarity Search Image Feature Extraction Query Image
  13. (c) 2020 Mercari, Inc. • Faiss • nmslib • Annoy

    • etc... Visual Search Processing Flow 15 Search Indexing Image Database Search Index Storage Similarity Search Search Index Builder ANN search should be used practically Image Feature Extraction Query Image
  14. (c) 2020 Mercari, Inc. Visual Search System 16 Search Index

    Storage Similarity Search Visual Search API Image Feature Extraction Query Image Search Result Query Image Image Feature Image IDs Image Feature Search Index Multiple servers are needed to keep the system running for each component Requirements • All the servers are being monitored • Unhealthy servers are replaced with new servers • When the system load becomes high, #of servers is increased and/or more computation resources are allocated.
  15. (c) 2020 Mercari, Inc. Dockerize the System 17 Search Index

    Storage Similarity Search Visual Search API Image Feature Extraction Query Image Search Result Query Image Image Feature Image IDs Image Feature Search Index Docker makes your development, test, and deployment safe and efficient Requirements • All the servers are being monitored • Unhealthy servers are replaced with new servers • When the system load becomes high, #of servers is increased and/or more computation resources are allocated.
  16. (c) 2020 Mercari, Inc. Query Image Search Result Run the

    System on Kubernetes 18 Search Index Storage Similarity Search Visual Search API Image Feature Extraction Query Image Image Feature Image IDs Image Feature Search Index Kubernetes provides self-healing, auto-scaling, and resource management • Google Kubernetes Engine • Amazon Elastic Container Service for Kubernetes • Azure Kubernetes Service
  17. (c) 2020 Mercari, Inc. How to Update Search Index 19

    Similarity Search Service Visual Search API Service Image Feature Extraction Service A Pod is the smallest unit which consists of one or more containers, like a feature extraction container and a logging agent container A service is an abstraction defines a group of pods Query Image Search Result
  18. (c) 2020 Mercari, Inc. How to Update Search Index 20

    Similarity Search Service Visual Search API Service Image Feature Extraction Service A Pod is the smallest unit which consists of one or more containers, like a feature extraction container and a logging agent container A service is an abstraction defines a group of pods Image Feature Extractor Image Database Search Index Builder Similarity Search Docker Image Builder Docker Image Registry Search Index Images Image Feature Vectors Google Container Registry Amazon Elastic Container Registry Azure Container Registry Docker Image Autom ated Deploym ent Batch Processing If the search index doesn’t have to be updated real time, this system would be practical enough to handle a few million images
  19. (c) 2020 Mercari, Inc. How to Scale the System 21

    Monthly Similarity Search Service Visual Search API Service Image Feature Extraction Service Daily Similarity Search Service Hourly Similarity Search Service Docker Image Registry Similarity searches are executed in all the similarity search services and the results are merged by similarity scores Every month Everyday Every hour In mercari, at least hundreds of millions of images have to be handled in the system
  20. (c) 2020 Mercari, Inc. Image Feature Extraction Service Latency Reduction

    by Edge Computing 22 Monthly Similarity Search Service Visual Search API Service Daily Similarity Search Service Hourly Similarity Search Service Docker Image Registry If the feature extraction model is small enough to run on mobile devices, you could reduce the latency and network traffic. Every month Everyday Every hour Feature Extraction Query Image Feature Vector MobileNet V2
  21. (c) 2020 Mercari, Inc. Visual Search System of Mercari Japan

    23 Index Building Kubernetes Cluster Monthly Index Builder Daily Index Builder Serving Kubernetes Cluster Visual Search Service Object Detection Service Feature Extraction Service Item DB Item Image Storage Feature Vec Storage Hourly Index Builder Item Explorer Item Image Downloader Feature Extractor ANN Index Builder ANN Index Storage Feature Vec Downloader ANN Index Builder Docker Registry Docker Image Builder Monthly ANN Service Monthly ANN Service Monthly Similarity Search Service Monthly ANN Service Monthly ANN Service Daily Similarity Search Service Monthly ANN Service Monthly ANN Service Hourly Similarity Search Service Feature Vec Downloader ANN Index Builder Kubernetes Engine Elastic Kubernetes Service Since item images are in AWS (S3) and our services are running on GCP, we use the both cloud services (*) The actual system architecture is slightly different from this.
  22. (c) 2020 Mercari, Inc. Processing Time 24 Serving Kubernetes Cluster

    Visual Search Service Object Detection Service Feature Extraction Service Monthly ANN Service Monthly ANN Service Monthly ANN Service Monthly ANN Service Monthly ANN Service Monthly ANN Service Kubernetes Engine 168ms 62ms 255ms Assuming that items in the past 1 year are searchable by the system, the system will have 11 monthly ANN services, 30 daily ANN services, and 24 hourly services. (*) The number of the items of each ANN service is different from actual one Monthly Similarity Search Service (30M items) Daily Similarity Search Service (1M items) Hourly Similarity Search Service (100K items) Monthly Similarity Search Service (30M items) Daily Similarity Search Service (1M items) Hourly Similarity Search Service (100K items) Visual Search Service 20ms 13ms 12ms 11 Services 30 Services 24 Services ANN (Similarity Search): Library: Faiss Index type: IVFADC (IndexIVFPQ) Code length per vector: 64B #cells visited for each query: 32, #cells: 8,192 In this experiment, 4 CPU cores are allocated for each service. Practically, resource allocation and the parameters of ANN should be optimized for each ANN service based on the number of items/images for each service. Docker allows us to allocate resources flexibly, like 1.5 CPU cores. Parallelly processed (*) The actual system architecture is slightly different from this. 362.4M image feature vectors Monthly Similarity Search Service Daily Similarity Search Service Hourly Similarity Search Service 12ms 13ms 20ms
  23. (c) 2020 Mercari, Inc. 25 Index Building Kubernetes Cluster Monthly

    Index Builder Daily Index Builder Serving Kubernetes Cluster Visual Search Service Object Detection Service Feature Extraction Service Item DB Item Image Storage Feature Vec Storage Hourly Index Builder Item Explorer Item Image Downloader Feature Extractor ANN Index Builder ANN Index Storage Feature Vec Downloader ANN Index Builder Docker Registry Docker Image Builder Monthly ANN Service Monthly ANN Service Monthly ANN Service Monthly ANN Service Monthly ANN Service Monthly ANN Service Feature Vec Downloader ANN Index Builder Kubernetes Engine Elastic Kubernetes Service Since item images are in AWS (S3) and our services are running on GCP, we use the both cloud services (*) The actual system architecture is slightly different from this. Can we simplify the system architecture? Monthly Similarity Search Service Daily Similarity Search Service Hourly Similarity Search Service
  24. (c) 2020 Mercari, Inc. Visual Search with Elasticsearch 26 Visual

    Search API Service Feature Vector Image Feature Extractor Image Database Images Image Feature Vectors Search Result Search Result Feature Vector Batch Processing Elasticsearch Index Lucene Index Shard Segments Lucene Index Shard Segments Lucene Index Shard Segments The fewer segments bring better performance. Merging segments before rolling it out is recommended. In the Open Distro kNN, nmslib (HNSW) is used. If the memory consumption is acceptable and you are familiar with Elasticsearch, it may be an option to realize simpler visual search. https://medium.com/@kumon/similarity-search-and-similar-image-search-in-elasticsearch-14552a8a8dea https://medium.com/@kumon/how-to-realize-similarity-search-with-elasticsearch-3dd5641b9adb
  25. (c) 2020 Mercari, Inc. Image Feature Extraction Model 28 14k

    Classes Global Average Pooling Input (224 x 224 x 3) Class labels category x brand x texture e.g. Nike striped men’s golf polos LuLaRoe floral girl’s dresses Louis Vuitton women’s long wallets Feature vector (1792D: 1,280D x 1.4) 9M images from Mercari MobileNet V2 (width multiplier: 1.4)
  26. (c) 2020 Mercari, Inc. 31 Query Images Results Visual Search

    from 100M items in Mercari These results can be acceptable in general, but not in Mercari
  27. (c) 2020 Mercari, Inc. Wordings 32 Fitted Apparels (Apparels worn

    by models) Flat Apparels (Apparels laid flat on a surface or hung on hangers)
  28. (c) 2020 Mercari, Inc. 33 What is the problem? (a)

    Query Image (b) Bad Results (c) Better Results Mercari is a C2C marketplace, where most sellers and buyers are not professional people • Most of apparel items are not worn by models • Apparels are laid flat on a surface or hung on hangers When a query image is a fitted apparel, fitted apparels tend to be retrieved more, even though such items have a very small proportion in the marketplace. Returning many items listed by professional sellers can cause problems for C2C marketplace, for example, by hurting buyer experience and discouraging nonprofessional sellers from listing items.
  29. (c) 2020 Mercari, Inc. Removing Human Features from the Image

    Features 34 ー ー ≃ ー ≃ ー extracts a feature vector which has no negative value elements and whose L2 norm is 1 T. Yamaguchi et al., Closing the Gap Between Query and Database through Query Feature Transformation in C2C e-Commerce Visual Search, SIGIR 2019 Workshop on eCommerce Fitted apparel Flat apparel Flat apparel Flat apparel
  30. (c) 2020 Mercari, Inc. Removing Human Features from the Image

    Features 35 ー ー ≃ ー ≃ ー extracts a feature vector which has no negative value elements and whose L2 norm is 1 T. Yamaguchi et al., Closing the Gap Between Query and Database through Query Feature Transformation in C2C e-Commerce Visual Search, SIGIR 2019 Workshop on eCommerce L2 norm normalization Negative value elements removal Fitted apparel Flat apparel Flat apparel Flat apparel
  31. (c) 2020 Mercari, Inc. Removing Human Features from the Image

    Features 36 ー ー ≃ ー ≃ ー extracts a feature vector which has no negative value elements and whose L2 norm is 1 T. Yamaguchi et al., Closing the Gap Between Query and Database through Query Feature Transformation in C2C e-Commerce Visual Search, SIGIR 2019 Workshop on eCommerce Thanks to this characteristic, the feature transformation can be applied to any kinds of images Fitted apparel Flat apparel Flat apparel Flat apparel
  32. (c) 2020 Mercari, Inc. Image Feature Extraction Model 37 14k

    Classes Global Average Pooling Input (224 x 224 x 3) Class labels category x brand x texture e.g. Nike striped men’s golf polos LuLaRoe floral girl’s dresses Louis Vuitton women’s long wallets Feature vector (1792D: 1,280D x 1.4) 9M images from Mercari MobileNet V2 (width multiplier: 1.4) ReLU L2 norm Normalization T. Yamaguchi et al., Closing the Gap Between Query and Database through Query Feature Transformation in C2C e-Commerce Visual Search, SIGIR 2019 Workshop on eCommerce
  33. (c) 2020 Mercari, Inc. How to Generate the Human Feature

    Vector 38 Fitted tops Flat tops Fitted bottoms Flat bottoms T. Yamaguchi et al., Closing the Gap Between Query and Database through Query Feature Transformation in C2C e-Commerce Visual Search, SIGIR 2019 Workshop on eCommerce Human vector of tops Human vector of bottoms
  34. (c) 2020 Mercari, Inc. Experiment Results 39 mAP@100 The testset

    had 20,000 images (flat apparel: 10,000 / fitted apparel: 10,000). 2,000 of them were used as query images. A retrieved item was evaluated as correctly selected only when it was an image of flat apparel in the same category as the query. Significant improvement for the fitted apparel queries in every category Also positively influenced flat apparel queries T. Yamaguchi et al., Closing the Gap Between Query and Database through Query Feature Transformation in C2C e-Commerce Visual Search, SIGIR 2019 Workshop on eCommerce
  35. (c) 2020 Mercari, Inc. Visual Search Applications 43 • Search

    Items by Image • Find Similar Items to Sold Item • Item Information Prediction for Sellers • Price Estimation by Image • Item Monitoring / Prohibited Item Detection
  36. (c) 2020 Mercari, Inc. Find Similar Items to Sold Item

    44 Describing the textures is not simple and consistent among users, visual search may bring better user experience
  37. (c) 2020 Mercari, Inc. 46 Item Information Prediction Item category,

    brand and title are predicted using item information of existing visually similar listings Amazon Echo Dot 3rd Generation Smart Speakers & Assistants Amazon (*) This feature is available in the both Mercari Japan and Mercari US apps
  38. (c) 2020 Mercari, Inc. What we learnt through the project

    Are you considering integrating visual search features into your services? 47 1. Try some visual search services first 2. If you find a service which meets your expectations, use it 3. Even if the pricing is an issue, use such a service first a. Confirm if the feature is useful for your service b. If necessary, consider developing your visual search system on your own for performance improvement and cost reduction 4. If you need a highly flexible and scalable system, build it on your own 5. The cost for developing and operating a visual search system would be high.