Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Behind the Scenes of Generative AI-Driven Produ...

Yuki Ishikawa
April 22, 2024
590

Behind the Scenes of Generative AI-Driven Products and Productivity Enhancements

April 16, 2024, "Learning from Pioneer Engineers: Current Thoughts Online Conference" presentation slides
Event link: https://findy.connpass.com/event/313119/

I hope that those who are using generative AI for product development or challenging themselves to improve productivity within their company will find this reference useful.

Yuki Ishikawa

April 22, 2024
Tweet

Transcript

  1. 1 Confidential Behind the Scenes of Generative AI-Driven Products and

    Productivity Enhancements Yuki Ishikawa April 16, 2024
  2. 2 Confidential Yuki Ishikawa Mercari, Inc. VP of Generative AI

    / LLM After graduating from the University of Tokyo, he joined Nintendo Co., Ltd. in 2012. In 2014, he joined Moi Corporation (TwitCasting) and engaged in various development projects and new business launches. In June 2017, he joined Mercari Group's Souzoh, Inc. (former). Later, he transferred to Mercari, Inc., and from July 2020, he served as the Executive Officer and VP of Product at Merpay, Inc. From January 2021, he became the Representative Director and CEO of Souzoh, Inc. From July 2022, he concurrently served as the Executive Officer and VP at Mercari, Inc. He assumed his current position in May 2023.
  3.    About February 1st, 2013 Established Tokyo, Fukuoka, Palo Alto,

    Bangalore Offices 2,101 (including subsidiaries) Headcount 5
  4.    What Is Mercari? • Service launch: July 2013 •

    Operating systems: Android, iOS *Can also be accessed through web browsers • Usage fee: Free *Sales fee for sold items: 10% of the sales price • Regions/languages supported: Base specs for Japan/Japanese • Total number of listings to date: More than 3 billion *As of November 2021 Many sellers enjoy having the items they no longer need purchased and used by buyers who need them, and buyers enjoy the feeling of hunting for treasure as they search through unique and diverse items for lucky finds. In addition to buying and selling, users actively communicate through the buyer/seller chat and the “Like” feature. The Mercari app is a C2C marketplace where individuals can easily sell used items. We want to provide both buyers and sellers with a service where they can enjoy safe and secure transactions. Mercari offers a unique customer experience, with a transaction environment that uses an escrow system, where Mercari temporarily holds payments, and simple and affordable shipping options. 6
  5.    YoY +11% YoY +14% Billion JPY Million users Marketplace—GMV/MAU


    7 1. Starting from FY2022.6, graph reflects retroactive adjustment to combine C2C and B2C 2. Quarterly average number of users who browsed our service (app or web) at least once during a given month GMV1/MAU2 (Billion JPY) (Million users) 2 1
  6. 9 Confidential Mercari's initiatives related to generative AI and LLM

    Establishment of Dedicated Generative AI/LLM Team Utilizing LLMs for SEO Conducting Merpay LLM Hackathon Mercari ChatGPT Plugin Release Formulation of Guidelines for LLM Usage Utilizing Generative AI for Videos in OOH Advertising Conducting Company-Wide Hackathon "Mercari AI Builders Fest" Utilizing Generative AI Visuals in Recruitment 2023 May June July, August Utilizing Generative AI Visuals in Advertising September, October, November, December 2024 January, February, March Release of "Mercari Listing Battle" on GPT Store Release of "Mercari Product Search" on GPT Store Release of Mercari AI Assist (Listing Support Feature) Release of Mercari AI Assist (Purchase Support Feature) Category Re-Mapping Using LLMs
  7. 10 Confidential Execution Mission of the Dedicated Generative AI/LLM Team

    • Creating new customer experiences and maximizing business impact by utilizing generative AI/LLM technology • Dramatically improving company-wide productivity Execution
  8. 14 Confidential Application and Adaptation to Existing Products Individual Function

    Teams (Team names are for illustrative purposes) Seller UX Buyer UX CS Fintech LLM team Planning, model selection, prompt engineering, product implementation, etc. B2C XB
  9. 15 Confidential Application and Adaptation to Existing Products (2) Co-creation

    (1) Leadership Cases where the Function team takes the lead, and the LLM team works alongside them, reviewing LLM-related aspects as needed Cases where the dedicated LLM team takes ownership and carries out everything from planning to implementation
  10. 16 Confidential SEO Improvement (Already Released) Generating search result title

    information for Mercari's search screen related to SEO using LLM • Keyword overlap between "parasol" and "umbrella" • Notation of brand names Generating titles using LLM for Category X Brand Name.
  11. 18 Confidential How to Use the "Improvement Suggestion for Listed

    Items" Feature Suggestions from the AI assistant are delivered for items that can be improved STEP.1 Open the chat and choose from the AI assistant's suggestions STEP.2
  12. 19 Confidential How to Use the "Improvement Suggestion Feature for

    Listed Items" Follow the AI assistant's instructions to proceed with the selection STEP.3 After updating the content and completing the process, the information for the listed item will be updated STEP.4
  13. 20 Confidential "Mercari AI Assist" Future Plans Purchase Support Features

    Listing support Features Troubleshooting Features Various features are scheduled for release
  14. 21 Confidential Publish Mercari Official GPTs on the OpenAI GPT

    Store Mercari Item search Actively creating GPTs to explore new experiences Mercari listed item battle
  15. 23 Confidential ① Guideline Formulation • Enabling not only the

    ML team but also general SWE teams to implement products • Formulating guidelines in collaboration with Mercari's R&D organization "R4D" • Publicly releasing developer guidelines (link) ② Study Sessions and Hackathons Company-wide Initiatives for LLM Readiness • Conducting internal study sessions on an irregular basis. • Holding hackathons for all job types, not limited to engineers. • Conducted 3 times in half a year: April at Mercari, June at Merpay, and September at Mercari. Continuing to be held at various locations.
  16. 24 Confidential Mercari Employee-Only Tool To promote internal usage, we

    created a Mercari employee-only "ChatGPT" that allows input of work-related information. It also supports GPT-4, Google Gemini, and Anthropic Claude3.
  17. 25 Confidential Mercari Employee-Only Tool In addition to the Code

    Interpreter and image generation features, it also includes a translation mode, an SQL generation function compatible with Mercari's data, and a document search function for engineers.
  18. 28 Confidential - Do Not Share Feasibility check Fast LLMs

    can perform a wide range of tasks with high accuracy Data creation Model training Release Effect verification LLM + Few-shot learning as an alternative Release Effect verification What you want to do It takes time and cost to get verification results Enables PoCs for many tasks with reasonable accuracy = Dramatic reduction in PoC costs
  19. 29 Confidential - Do Not Share Fast? An example initially

    tested on the item list screen of "Mercari AI Assist" (2023.09) • GPT-3.5: 1.5~2.5 seconds/item • GPT4: 3~5 seconds/item The response speed of LLMs can sometimes take time depending on the usage and model, and for consumer-facing services that require quick responses, some ingenuity is necessary. Artificial Analysis
  20. 30 Confidential - Do Not Share Cheap • Being able

    to use a model equivalent to GPT3.5 at this price is cheap ◦ Until January 25, 2024 (GPT-3.5 Turbo) ▪ Input: $1.0 / 1M tokens ▪ Output: $6.0 / 1M tokens ◦ After January 25, 2024 (GPT-3.5 Turbo) ▪ Input: $0.5 / 1M tokens ▪ Output: $1.5 / 1M tokens • Claude Haiku, which recently came out, is even cheaper. Claude 3 Haiku Artificial Analysis See also: New embedding models and API updates
  21. 31 Confidential - Do Not Share Cheap? • While smaller

    models are cheaper, they are not always adequate. But as model size increases, cost also rapidly goes up. • Especially for large-scale consumer-facing services like Mercari, trying to use them without any consideration can result in a hefty price (although it has become much cheaper compared to last year) Ex. Assuming Mercari has 2 million listings per day, and if some kind of LLM processing is applied to all items (with only 1 LLM call per item), the annual cost for GPT-4 Turbo would be 550 million yen (however, with GPT-3.5 Turbo, it would be 27 million yen) Artificial Analysis
  22. 32 Confidential - Do Not Share Tasty (=good) It translates

    unique words appropriately without any additional training If you design appropriate flows, it can make decisions and take actions on its own Although I can't write everything here, as you all know, it's really tasty (amazing) in so many ways! It can understand and generate vision and audio as well Pokémon character "リザードン" English: Charizard French: Dracaufeu Korean: 리자몽 Mercari AI Assist Purchase Assistance Feature What would be a good birthday present for my 5-year-old daughter? GPT Additional questions Hit the API and display search results etc. Customer Support How can I assist you today? Let me take a look at the photo you provided It can understand the other person's speech and respond verbally It can also read image information
  23. 33 Confidential - Do Not Share Tasty (=good)? There are

    many great things about it! On the other hand, controlling the output content is particularly difficult • It sometimes states incorrect things as if they were correct (hallucination) • Even with the same input, prompts, etc., and after a lot of correction, it may occasionally (e.g., 0.01% of the time) produce different outputs • It still struggles with tasks like providing confidence scores or calculating meaningful scores • QA is extremely challenging 🤮
  24. 34 Confidential - Do Not Share Ingenuity in Product Development

    Using LLM It's the most exciting and enjoyable technology, but when using it in products provided to customers, it's generally necessary to face the difficulty of control Control Challenges🔥 • Output content (hallucination) • Output speed (latency) • Cost
  25. 35 Confidential - Do Not Share Ingenuity in Product Development

    Using LLM at Mercari 1. Using LLM as the underlying logic in a way that is not directly visible to customers ex. "Using in web SEO" 2. Interaction between customers and LLM (without free input) Initially adopting a selection-based approach with no free input from customers ex. "Mercari AI Assist (Listing Improvement Suggestion Feature)" 3. Interaction between customers and LLM (with free input) Allowing free input while controlling the input scope and output to a certain extent ex. "Mercari AI Assist (Purchase Support Feature)" 4. Moving towards initiatives with even greater freedom For product initiatives with a large scope of impact, we are taking the approach of gradually expanding the allowable range while learning from the initiative in the production environment
  26. 36 Confidential - Do Not Share Ingenuity in Mercari AI

    Assist (Listing Improvement Suggestion Feature) We were using GPT-3.5 for the underlying logic, but we wondered if we could find a model that has • Higher accuracy & more stable output than GPT-3.5 • Lower latency • Lower cost =>With this motivation, we executed fine-tuning of a small-scale OSS model Used for extracting product information
  27. 37 Confidential - Do Not Share Fine-tuning is a technique

    for tuning a pre-trained model to optimize it for a specific task Instead of updating all parameters, we adopted QLoRA, which is a type of PEFT (Parameter-efficient fine-tuning) that efficiently updates only a subset of the parameters (chosen because it can be executed quickly without compromising quality) About the adopted method (Fine-tuning with QLoRA) [2305.14314] QLoRA: Efficient Finetuning of Quantized LLMs
  28. 38 Confidential - Do Not Share Consideration of models to

    adopt (Japanese LLMs) We considered the top 5 models of 7B size in LLM JP Eval (Multi-Choice QA, NLI, QA, Reading Comprehension) as candidates (as of December 2023) • tokyotech-llm/Swallow-7b-instruct-hf (Llama-based) • rinna/nekomata-7b-instruction (Qwen-based) • stabilityai/StableBeluga-7B (Llama-based) • rinna/youri-7b-instruction (Llama-based) • mistralai/Mistral-7B-Instruct-v0.2 (Mistral-based) Reference: Nejumi LLM Leaderboard
  29. 39 Confidential - Do Not Share 1. Prepare the fine-tuning

    dataset ◦ Output: JSON/CSV files 2. Fine-tuning (one A100 GPU (48GB)) ◦ Output: LoRA adapters + base model (merged) - ~29GB 3. Post-training Quantization ◦ Using llama.cpp ◦ Output: GGUF model file - ~4GB 4. Evaluate ◦ BLEU Score (4-gram) ▪ 1.3x better than GPT-3.5 Turbo ◦ Cost (10 pods NVIDIA Tesla T4) ▪ 14x cheaper than GPT-3.5 Turbo (OpenAI) Fine-tuning with QLoRA
  30. 40 Confidential - Do Not Share Behind the Scenes of

    Productivity Improvement with Generative AI
  31. 41 Confidential - Do Not Share Company-wide use of generative

    AI services is increasing GitHub Copilot usage also skyrocketed in about half a year from July 2023 to February Changes in DAUs Usage of "ChatGPT" service exclusively for Mercari employees grew by 230% in one year Total Shown: 8x increase Total Accepts: 6.5x increase Acceptance Rate: Around 30%
  32. 42 Confidential - Do Not Share Providing specific features for

    specific use cases In order to further promote deeper usage, it is important to provide specialized features for specific use cases in addition to the general features of the "ChatGPT” service exclusively for Mercari employees. • mercari Dev Assist LLM provides answers based on Mercari's internal technical documents • mercari Analytics Assist A tool that assists with data analysis based on knowledge from Mercari's databases
  33. 43 Confidential - Do Not Share Providing specific features for

    specific use cases "How do I create MS from scratch?" GPT-4 Generate Query: "implement a new microservice". Vector DB by FAISS Microservice Wiki In-House Dev Services Github Issue Slack Message Generate Summary & Create embeddings by LLM GPT-4 & text-embedding-3 Respond with Related documents Answer based on documents
  34. 44 Confidential Formation of Cross-Organizational Virtual Teams The dedicated generative

    AI team forms virtual teams with various internal teams to execute advancement projects lasting 3 to 6 months. July~September Generative AI x Marketing & Creative Team October~December Generative AI x Analytics Team Generative AI x CS Ops Team January~March Generative AI x HR & Corporate Team
  35. 45 Confidential Recent Success Cases of Virtual Teams Formed a

    virtual team with Generation AI Team x Marketing & Creative Team, set OKRs and progressed the following milestones. • July: Recruitment Visuals by Generative AI • August: Campaign Visuals by Generative AI • September: Video Ads by Generative AI
  36. 47 Confidential - Do Not Share Current Evaluation Before implementation:

    12 business days Planning (3 days) → Illustration drawing (5 days) → Design creation (4 days) After implementation: 4 business days Planning (1 day) → AI-generated illustration (1 day) → Design creation (2 days) → Succeeded in reducing production man-hours while creating novel illustrations that are not found in stock materials Production Man-hours: Rating ◯ (Significantly reduced) With recent updates, the breadth of quality and reproducibility has expanded, not only for hobby and merchandise goods but also for fashion items Reproducibility: Rating ◯ Contribution to Results: Rating ◯ (Particularly advantageous in terms of CVR)
  37. 48 Confidential - Do Not Share CTVR Mapping Visuals and

    videos produced using generative AI tend to have higher CVRs. Campaigns created with generative AI, such as "One in seven people use Mercari" and "Halloween," have achieved high acquisition efficiency overall. Creative A Creative B
  38. 49 Confidential - Do Not Share Ample Room for Further

    Utilization For Leadership • => Incorporation into company policies and goal setting, such as OKRs, through top-down approach can further promote adoption For Members (General Use) • Currently, there is still a significant variation in the degree of utilization among individuals • => Continuously implement overall improvement measures (e.g., hackathons) and provide services for specific use cases (it is also important to measure and visualize results with numbers) Utilization in Each Project • There are cases where side support works well and cases where it is challenging • => Assign members who are committed to AI in some form (fully assign dedicated generative AI members, hire dedicated members on the project side)
  39. 50 Confidential "Current State of Thinking" on Generative AI Utilization

    within the Organization Interacting with services Creating by oneself Increasing the number of creators
  40. 51 Confidential - Do Not Share We are hiring! •

    Senior Technical Product Manager • Engineering Manager • Mobile Engineer, Full Stack