[CVPR 2020 Tutorial] A Large-Scale Visual Search System in the C2C Marketplace App Mercari
CVPR 2020 Tutorial - Image Retrieval in the Wild
This presentation introduces how we integrated the visual search into the C2C marketplace app, Mercari, which has 1.5+B listings and 16+M monthly active users.
Monthly active users The Mercari app is a C2C marketplace where individuals can easily sell used items 1.5+ Billion Total number of items The system should be highly scalable
you query when you look for clothes like this? • “Plaid” and “Shirt”? • “Plaid” and “Blouse”? • “Checks” and “Blouse”? • “Checks” and “Blouse” and “Frill”? • “Gingham Plaid” and “Blouse” and “Frill”? • “Gingham Plaid” and “Blouse” and “Frill” and “Collar”? • ... If you find what you want, you are lucky If not, you just get tired...
consumer-to-consumer (C2C) marketplace, enough information for each item is not always provided. Even if buyers know how to describe what they want properly, items without enough information may not be found by text-based search. This item cannot be discovered by “gingham plaid blouse”, even though the query terms are correct black and white plaid shirt
the Mercari App 10 Query: “black white plaid shirt” Query: Text-based Search Visual Search (*) The visual search is available only on the Mercari Japan iOS app as of now Query: “Gingham Plaid Frill Blouse”
Query: Text-based Search Visual Search (*) The visual search is available only on the Mercari Japan iOS app as of now Query: “iphone xs max 256gb” (iPhone XS Max 256GB) iPhone X iPhone XS iPhone 6s / 7 iPhone XS Max 256GB Text-based search is better when: • sellers and buyers describe products in the same way • visual information is not enough to explain products
2021, retailers that support visual and voice search will increase their e-commerce revenue by 30% (Gartner) ◦ https://www.gartner.com/smarterwithgartner/gartner-top-strategic-predictions-for-2018-and-beyond/ • 62% of millenials want a visual search to discover products on their mobile devices (ViSenze) ◦ https://www.fastgrowthbrands.com/2018/08/retailers-must-optimise-omnichannel-mobile-for-millennials/ • 70+% of online shoppers in the UK in the under 25-year-old group have used or plan to use a visual search tool (GlobalData) ◦ https://www.retail-insight-network.com/comment/rise-of-visual-search-2019/ • 36% of consumers have performed or used visual search (Intent Lab) ◦ https://www.businesswire.com/news/home/20190204005613/en/Visual-Search-Wins-Text-Consumers'-Trusted-Information
• etc... Visual Search Processing Flow 15 Search Indexing Image Database Search Index Storage Similarity Search Search Index Builder ANN search should be used practically Image Feature Extraction Query Image
Storage Similarity Search Visual Search API Image Feature Extraction Query Image Search Result Query Image Image Feature Image IDs Image Feature Search Index Multiple servers are needed to keep the system running for each component Requirements • All the servers are being monitored • Unhealthy servers are replaced with new servers • When the system load becomes high, #of servers is increased and/or more computation resources are allocated.
Storage Similarity Search Visual Search API Image Feature Extraction Query Image Search Result Query Image Image Feature Image IDs Image Feature Search Index Docker makes your development, test, and deployment safe and efficient Requirements • All the servers are being monitored • Unhealthy servers are replaced with new servers • When the system load becomes high, #of servers is increased and/or more computation resources are allocated.
System on Kubernetes 18 Search Index Storage Similarity Search Visual Search API Image Feature Extraction Query Image Image Feature Image IDs Image Feature Search Index Kubernetes provides self-healing, auto-scaling, and resource management • Google Kubernetes Engine • Amazon Elastic Container Service for Kubernetes • Azure Kubernetes Service
Similarity Search Service Visual Search API Service Image Feature Extraction Service A Pod is the smallest unit which consists of one or more containers, like a feature extraction container and a logging agent container A service is an abstraction defines a group of pods Query Image Search Result
Similarity Search Service Visual Search API Service Image Feature Extraction Service A Pod is the smallest unit which consists of one or more containers, like a feature extraction container and a logging agent container A service is an abstraction defines a group of pods Image Feature Extractor Image Database Search Index Builder Similarity Search Docker Image Builder Docker Image Registry Search Index Images Image Feature Vectors Google Container Registry Amazon Elastic Container Registry Azure Container Registry Docker Image Autom ated Deploym ent Batch Processing If the search index doesn’t have to be updated real time, this system would be practical enough to handle a few million images
Monthly Similarity Search Service Visual Search API Service Image Feature Extraction Service Daily Similarity Search Service Hourly Similarity Search Service Docker Image Registry Similarity searches are executed in all the similarity search services and the results are merged by similarity scores Every month Everyday Every hour In mercari, at least hundreds of millions of images have to be handled in the system
by Edge Computing 22 Monthly Similarity Search Service Visual Search API Service Daily Similarity Search Service Hourly Similarity Search Service Docker Image Registry If the feature extraction model is small enough to run on mobile devices, you could reduce the latency and network traffic. Every month Everyday Every hour Feature Extraction Query Image Feature Vector MobileNet V2
23 Index Building Kubernetes Cluster Monthly Index Builder Daily Index Builder Serving Kubernetes Cluster Visual Search Service Object Detection Service Feature Extraction Service Item DB Item Image Storage Feature Vec Storage Hourly Index Builder Item Explorer Item Image Downloader Feature Extractor ANN Index Builder ANN Index Storage Feature Vec Downloader ANN Index Builder Docker Registry Docker Image Builder Monthly ANN Service Monthly ANN Service Monthly Similarity Search Service Monthly ANN Service Monthly ANN Service Daily Similarity Search Service Monthly ANN Service Monthly ANN Service Hourly Similarity Search Service Feature Vec Downloader ANN Index Builder Kubernetes Engine Elastic Kubernetes Service Since item images are in AWS (S3) and our services are running on GCP, we use the both cloud services (*) The actual system architecture is slightly different from this.
Visual Search Service Object Detection Service Feature Extraction Service Monthly ANN Service Monthly ANN Service Monthly ANN Service Monthly ANN Service Monthly ANN Service Monthly ANN Service Kubernetes Engine 168ms 62ms 255ms Assuming that items in the past 1 year are searchable by the system, the system will have 11 monthly ANN services, 30 daily ANN services, and 24 hourly services. (*) The number of the items of each ANN service is different from actual one Monthly Similarity Search Service (30M items) Daily Similarity Search Service (1M items) Hourly Similarity Search Service (100K items) Monthly Similarity Search Service (30M items) Daily Similarity Search Service (1M items) Hourly Similarity Search Service (100K items) Visual Search Service 20ms 13ms 12ms 11 Services 30 Services 24 Services ANN (Similarity Search): Library: Faiss Index type: IVFADC (IndexIVFPQ) Code length per vector: 64B #cells visited for each query: 32, #cells: 8,192 In this experiment, 4 CPU cores are allocated for each service. Practically, resource allocation and the parameters of ANN should be optimized for each ANN service based on the number of items/images for each service. Docker allows us to allocate resources flexibly, like 1.5 CPU cores. Parallelly processed (*) The actual system architecture is slightly different from this. 362.4M image feature vectors Monthly Similarity Search Service Daily Similarity Search Service Hourly Similarity Search Service 12ms 13ms 20ms
Index Builder Daily Index Builder Serving Kubernetes Cluster Visual Search Service Object Detection Service Feature Extraction Service Item DB Item Image Storage Feature Vec Storage Hourly Index Builder Item Explorer Item Image Downloader Feature Extractor ANN Index Builder ANN Index Storage Feature Vec Downloader ANN Index Builder Docker Registry Docker Image Builder Monthly ANN Service Monthly ANN Service Monthly ANN Service Monthly ANN Service Monthly ANN Service Monthly ANN Service Feature Vec Downloader ANN Index Builder Kubernetes Engine Elastic Kubernetes Service Since item images are in AWS (S3) and our services are running on GCP, we use the both cloud services (*) The actual system architecture is slightly different from this. Can we simplify the system architecture? Monthly Similarity Search Service Daily Similarity Search Service Hourly Similarity Search Service
Search API Service Feature Vector Image Feature Extractor Image Database Images Image Feature Vectors Search Result Search Result Feature Vector Batch Processing Elasticsearch Index Lucene Index Shard Segments Lucene Index Shard Segments Lucene Index Shard Segments The fewer segments bring better performance. Merging segments before rolling it out is recommended. In the Open Distro kNN, nmslib (HNSW) is used. If the memory consumption is acceptable and you are familiar with Elasticsearch, it may be an option to realize simpler visual search. https://medium.com/@kumon/similarity-search-and-similar-image-search-in-elasticsearch-14552a8a8dea https://medium.com/@kumon/how-to-realize-similarity-search-with-elasticsearch-3dd5641b9adb
Classes Global Average Pooling Input (224 x 224 x 3) Class labels category x brand x texture e.g. Nike striped men’s golf polos LuLaRoe floral girl’s dresses Louis Vuitton women’s long wallets Feature vector (1792D: 1,280D x 1.4) 9M images from Mercari MobileNet V2 (width multiplier: 1.4)
Query Image (b) Bad Results (c) Better Results Mercari is a C2C marketplace, where most sellers and buyers are not professional people • Most of apparel items are not worn by models • Apparels are laid flat on a surface or hung on hangers When a query image is a fitted apparel, fitted apparels tend to be retrieved more, even though such items have a very small proportion in the marketplace. Returning many items listed by professional sellers can cause problems for C2C marketplace, for example, by hurting buyer experience and discouraging nonprofessional sellers from listing items.
Features 34 ー ー ≃ ー ≃ ー extracts a feature vector which has no negative value elements and whose L2 norm is 1 T. Yamaguchi et al., Closing the Gap Between Query and Database through Query Feature Transformation in C2C e-Commerce Visual Search, SIGIR 2019 Workshop on eCommerce Fitted apparel Flat apparel Flat apparel Flat apparel
Features 35 ー ー ≃ ー ≃ ー extracts a feature vector which has no negative value elements and whose L2 norm is 1 T. Yamaguchi et al., Closing the Gap Between Query and Database through Query Feature Transformation in C2C e-Commerce Visual Search, SIGIR 2019 Workshop on eCommerce L2 norm normalization Negative value elements removal Fitted apparel Flat apparel Flat apparel Flat apparel
Features 36 ー ー ≃ ー ≃ ー extracts a feature vector which has no negative value elements and whose L2 norm is 1 T. Yamaguchi et al., Closing the Gap Between Query and Database through Query Feature Transformation in C2C e-Commerce Visual Search, SIGIR 2019 Workshop on eCommerce Thanks to this characteristic, the feature transformation can be applied to any kinds of images Fitted apparel Flat apparel Flat apparel Flat apparel
Classes Global Average Pooling Input (224 x 224 x 3) Class labels category x brand x texture e.g. Nike striped men’s golf polos LuLaRoe floral girl’s dresses Louis Vuitton women’s long wallets Feature vector (1792D: 1,280D x 1.4) 9M images from Mercari MobileNet V2 (width multiplier: 1.4) ReLU L2 norm Normalization T. Yamaguchi et al., Closing the Gap Between Query and Database through Query Feature Transformation in C2C e-Commerce Visual Search, SIGIR 2019 Workshop on eCommerce
Vector 38 Fitted tops Flat tops Fitted bottoms Flat bottoms T. Yamaguchi et al., Closing the Gap Between Query and Database through Query Feature Transformation in C2C e-Commerce Visual Search, SIGIR 2019 Workshop on eCommerce Human vector of tops Human vector of bottoms
had 20,000 images (flat apparel: 10,000 / fitted apparel: 10,000). 2,000 of them were used as query images. A retrieved item was evaluated as correctly selected only when it was an image of flat apparel in the same category as the query. Significant improvement for the fitted apparel queries in every category Also positively influenced flat apparel queries T. Yamaguchi et al., Closing the Gap Between Query and Database through Query Feature Transformation in C2C e-Commerce Visual Search, SIGIR 2019 Workshop on eCommerce
Items by Image • Find Similar Items to Sold Item • Item Information Prediction for Sellers • Price Estimation by Image • Item Monitoring / Prohibited Item Detection
brand and title are predicted using item information of existing visually similar listings Amazon Echo Dot 3rd Generation Smart Speakers & Assistants Amazon (*) This feature is available in the both Mercari Japan and Mercari US apps
Are you considering integrating visual search features into your services? 47 1. Try some visual search services first 2. If you find a service which meets your expectations, use it 3. Even if the pricing is an issue, use such a service first a. Confirm if the feature is useful for your service b. If necessary, consider developing your visual search system on your own for performance improvement and cost reduction 4. If you need a highly flexible and scalable system, build it on your own 5. The cost for developing and operating a visual search system would be high.