Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AI Frontiers Revealed: Transforming LINE Shoppi...

AI Frontiers Revealed: Transforming LINE Shopping TW with LLM-Driven Product Attribute Extraction

LINE Shopping TW では、2,000万件以上の商品情報から属性を抽出するために、大規模言語モデル(LLM)を活用しています。本セッションでは、自動カタログ生成、コンボ検索、検索意図の判定などを支える実践的なプロンプト設計戦略と、大規模データ環境におけるスケーラブルかつコスト効率の高いLLM活用の知見を共有します。プロンプト設計、Few-shot学習、精度・コスト・性能のバランスについて解説します。

More Decks by LINEヤフーTech (LY Corporation Tech)

Other Decks in Technology

Transcript

  1. AI Frontiers Revealed: Transforming LINE SHOPPING TW with LLM-Driven Product

    Attribute Extraction LINE Taiwan & EC Data Vila Lin
  2. Traditional Approach Comparison ⎯ High maintenance cost ⎯ Cannot handle

    complex semantics ⎯ Difficult to expand to new categories LLM-Based ⎯ Powerful semantic understanding ⎯ Quick adaptation to new categories ⎯ Unified processing framework
  3. OpenAI with Webpage URL { "attributes": { "Brand": "ROCCAT", "Model":

    "Cross", "Type": "Headphones", "Color": "Black", "Sound Quality": "High quality", "Battery Life": "Long-lasting", "Weight": "300g", "Wearing Style": "Over-ear", "Country of Origin": "Germany", "Noise Cancellation": "Active noise cancellation", "Bluetooth": "Bluetooth supported", "Water Resistance": "Splash-proof", "USB": "USB connection” } } Since I cannot browse the internet, I will simulate the extraction based on the provided product description. - "Weight": "300g" (assumed based on similar products) - "Country of Origin": "Germany" (assumed based on brand origin) As an AI language model, I do not have the capability to browse the internet or access external websites in real-time.
  4. Abnormal Product Name Data Cleansing ⎯ Special characters/encoding ⎯ Abnormally

    long/short name ⎯ Repeated/meaningless string Blocklist Sentence ⎯ Promotional phrase - sale in limited time - free shipping - pre-order ⎯ SEO keyword abuse
  5. Structure & Priority & Constraint Instruction ⎯ Assign appropriate role

    ⎯ Define clear processing sequence ⎯ Tell model what must exclude ⎯ Mandatory format and output ⎯ Exception handling guidelines Advance Features ⎯ Target attribute pattern ⎯ Semantic-based attribute mapping ⎯ Attribute standardization
  6. Model Number ### Extraction Rules: - **Model Identification**: - Extract

    only complete model numbers explicitly stated in: - Product name, Product description, Web Content - Include alphanumeric and special characters (e.g., "05C4210-161+C2"). - Do not infer or create model numbers - **Return Format**: - If no model number is explicitly stated, return an empty `string ('')`. ### Strict Exclusions: - Storage/Memory: 2TB, 1TB, 256GB, 8G - Network: WiFi, 5G - Connection types: Wireless, Bluetooth - Features: Gaming, Portable, RGB - Size/Dimension: XL, S/M/L, 46mm, 13.3 - Serial-like or SKU-only numbers: 1527
  7. Model Number ### Brand-Specific Patterns: - **Apple**: - Format: [Prefix][Numeric

    Identifier][Optional Suffix] - Example: MVY03TA/A, MRXV3TA/A, A3047 - Prefix: 1–3 letters (e.g., MVY, MRX, A) - Numeric Identifier: 1-4 digits (e.g., 03, 3049) - Suffix: Optional characters (e.g., TA, /A) - Exclude: Format M + 1-2 digits (e.g., M2, M4) - **Samsung**: - Format: [Prefix][Full Model Code] - Example: SM-R820NZSABRI, QA65Q80TAWXZW - Prefix: SM-, EF-, QA-, MU-, UA-, RS, WW, NP- - Full Model Code: variant/region identifiers - **Sony**: - Format: [Prefix]-[Model Identifier][Optional Suffix] - Example: CFI-Y101801, WH-1000XM5/P - Prefix: >= 2 English letters (e.g., CFI, WH, SRS, ILCE) - Model Identifier: Alphanumeric (e.g., 1000XM5, G500) - Optional Suffix: Slash ("/") followed by a letter/number (e.g., /P, /S2, M90) - Length: > 4 characters
  8. Example |**Product Name** |**Brand** |**Model Number**|**Series Name** | |-------------------------------------------|----------|----------------|------------------------| |iPhone

    16 Pro 256GB MYNL3ZP/A |Apple |MYNL3ZP/A |iPhone 16 Pro 256GB | |SAMSUNG T7 Shield 1TB MU-PE1T0S/WW |Samsung |MU-PE1T0S/WW |T7 Shield 1TB | |Logic Logitech M185 wireless mouse |Logitech |M185 |wireless mouse | |LG 27UL500-W 27-inch 4K IPS |LG |27UL500-W |27-inch 4K IPS | |Dyson V12 Detect Slim Submarine |Dyson |V12 Detect Slim |Detect Slim Submarine | |XBOX ONE Resident Evil 8:Village |Microsoft | |Resident Evil 8:Village| |Monster Hunter Wild Special Version Switch |Nintendo | |Switch | - Quality over quantity - Depend on category - Markdown is more effective than JSON
  9. Query ⎯ Category ID ⎯ Attributes ⎯ Product Info ⎯

    Web Content ⎯ Instructions (Control Quality)
  10. Cost-Efficiency Strategy Benefits ⎯ Cost Optimization ⎯ Scalable ⎯ Quality

    Assured ⎯ Smaller model (e.g. GPT 4o-mini) ⎯ Batch processing API ⎯ Collect error job and retry
  11. Empty Post-Processing ⎯ unknown ⎯ n/a ⎯ not provided ⎯

    no data Multi-value ⎯ Color (e.g. red/yellow) ⎯ Size (e.g. 39/41/43) ⎯ Volume (e.g. 84 x 40 x 169) Unit ⎯ Normalization ⎯ Synonym mapping
  12. Automated Metrics Evaluation ⎯ Precision ⎯ Recall ⎯ Empty ratio

    ⎯ Anomaly detection Manual ⎯ Sample popular products ⎯ Random sampling as fallback ⎯ Gather verification metric - incorrect attributes - missed attributes - mistake from category
  13. Common Mistake Pitfalls to Avoid ⎯ Over-reliance on LLM inference

    ⎯ Ignoring data cleansing ⎯ Single solution thinking Best Practice ⎯ Strictly limit inference ⎯ Invest in preprocessing ⎯ Multi-layer evaluations