LINE MANGA Uses ClickHouse for Real-Time Analysis Solving Data Integration Challenges with ClickHouse (Another way to introduce ClickHouse / ClickHouse を導入するもう一つの方法) Kazuki Matsuda @ LINE Digital Frontier.
LINE MANGA relies on numerous MySQL servers, but faced challenges with real-time analysis. Before introducing ClickHouse, we relied on custom scripts for each analysis. This approach was difficult to develop and review, and execution was slow. In theory, almost all such tasks can be done with simple SQL, which could be naturally parallelized by the query engine. However, due to our vertical and horizontal sharding, this method became impossible. ClickHouse’s integration engine resolves this issue. It allows data stored in different MySQL locations to be joined and aggregated with simple SQL. We believe this will serve as a helpful reference for improving the developer experience, as well as a good first step towards implementing ClickHouse.
• Introduction – Who we are? • LINE Digital Frontier (LINE MANGA) • The problem – Too many MySQL servers • Approach by ClickHouse • ClickHouse feature: MySQL Table Engine, Merge Table. • Architecture for integration search and extraction. • Conclusion & Closing
• Introduction – Who we are? • LINE Digital Frontier (LINE MANGA) • The problem – Too many MySQL servers • Approach by ClickHouse • ClickHouse feature: MySQL Table Engine, Merge Table. • Architecture for integration search and extraction. • Conclusion & Closing
WE ARE? • We are the team behind Manga app (LINE MANGA) and Web (ebookjapan), • Occasionally ranks at the top of app store (Apple / Google) sales in Japan. • Part of WEBTOON Entertainment Inc.
• Introduction – Who we are? • LINE Digital Frontier (LINE MANGA) • The problem – Too many MySQL servers • Approach by ClickHouse • ClickHouse feature: MySQL Table Engine, Merge Table. • Architecture for integration search and extraction. • Conclusion & Closing
problem: Too many MySQL servers • Horizontally and vertically sharded. Master (Product) Data User 0x00~0x06 User 0x07~0x0f User 0xf7~0xff (Primary and Replica) User 0x00~0x06 User 0x00~0x06 (Primary and Replica) User 0x07~0x0f User 0x07~0x0f (Primary and Replica) User 0xf7~0xff (Primary and Replica) User 0xf7~0xff (Primary and Replica) Master (Product) Data Master (Product) Data Master (Product) Data
– Difficulty in ad-hoc analysis • It is impossible to… • Join master data WITH user data. • Aggregate for ALL user data. • Existing analytical platform solve some point of above issue. But • ETL based – not up-to-date analysis. • Security (privacy and auditing) – can’t upload some data due to policy, or does not support financial audits.
– Difficulty in ad-hoc analysis • It is impossible to… • Join master data WITH user data. • Aggregate for ALL user data. • Existing analytical platform solve some point of above issue. But • ETL based – not up-to-date analysis. • Security (privacy and auditing) – can’t upload some data due to policy, or does not support financial audits.
– Difficulty in ad-hoc analysis • It is impossible to… • Join master data WITH user data. • Aggregate for ALL user data. • Existing analytical platform solve some point of above issue. But • ETL based – not up-to-date analysis. • Security (privacy and auditing) – can’t upload some data due to policy, or does not support financial audits.
– Difficulty in ad-hoc analysis • It is impossible to… • Join master data with user data. • Aggregate for all user data. • Existing analytical platform solve some point of above issue. But • ETL based – not up-to-date analysis. • Security (privacy and auditing) – can’t upload some data due to policy, or does not support financial audits. i.e.) Hard to maintain persistent state with security.
• Introduction – Who we are? • LINE Digital Frontier (LINE MANGA) • The problem – Too many MySQL servers • Approach by ClickHouse • ClickHouse feature: MySQL Table Engine, Merge Table. • Architecture for integration search and extraction. • Conclusion & Closing
- ClickHouse Features (1/2) • ClickHouse has some (virtual) engine for integration. • e.g.) “MySQL Table Engine”, “MySQL Database Engine” • Query to MySQL virtual table on ClickHouse triggers ClickHouse to MySQL query on the fly. • ClickHouse can apply arbitrary operation on retrieved data. Including join and aggregation function. https://clickhouse.com/docs/en/engines/table-engines/integrations/mysql
* FROM user_00.transaction INNER JOIN master.manga_book USING (book_id) WHERE transaction.user_id = ‘0x01234’ Approach - ClickHouse Features (1/2) User 0x00~0x06 (Primary and Replica) Master (Product) Data SELECT * FROM transaction WHERE transaction.user_id = ‘0x01234’ -- Condition pushdown. SELECT * FROM manga_book
* FROM user_00.transaction INNER JOIN master.manga_book USING (book_id) WHERE transaction.user_id = ‘0x01234’ Approach - ClickHouse Features (1/2) User 0x00~0x06 (Primary and Replica) Master (Product) Data SELECT * FROM transaction WHERE transaction.user_id = ‘0x01234’ -- Condition pushdown. SELECT * FROM manga_book Vertical sharding issue solved!
- ClickHouse Features (2/2) • MergeTable Engine • Can aggregate data from multiple horizontally sharded tables. • Like a UNION View. https://clickhouse.com/docs/en/engines/table-engines/special/merge
- ClickHouse Features (2/2) MergeTable users MySQL DB Engine user_00 MySQL DB Engine user_08 MySQL DB Engine MySQL User 0x00~0x07 MySQL User 0x08~0x0f … SELECT book_id, SUM(sales) FROM users.transaction WHERE sales_at = ‘2025-01-01’ GROUP BY book_id
- ClickHouse Features (2/2) MergeTable users MySQL DB Engine user_00 MySQL DB Engine user_08 MySQL DB Engine MySQL User 0x00~0x07 MySQL User 0x08~0x0f … Horizontal sharding issue solved! SELECT book_id, SUM(sales) FROM users.transaction WHERE sales_at = ‘2025-01-01’ GROUP BY book_id
JOIN/Aggregate distributed My SQL instances. • Vertical Sharding • Master data MySQL instance • User data MySQL instances • 2 horizontally shard. User 0x00~0x06 (Primary and Replica) User 0x07~0x0f (Primary and Replica) Master (Product) Data Just a simple SQL.
• In closing, the Integration Engine can be very useful even if you don't store data natively in ClickHouse. • Can join multi MySQL servers, other DBs, many formats in local disk and over http. • Can try with single binary called clickhouse-local. Let’s try. https://clickhouse.com/docs/en/operations/utilities/clickhouse-local
you! • For your attention and time today. • And to my colleagues who contributed ideas and feedback that formed the basis of today’s presentation. • Special thanks to Okada-san, who introduced me to ClickHouse. Unfortunately, he couldn’t join us today, but he’s one of the key people driving large-scale adoption of ClickHouse. • And to everyone at LINE Digital Frontier who collaborated with me to take on this idea together.
• Comparison with other solution. • Trino, Spark etc… (Query Engine which has integration connector) • Relatively hard to setup locally. • Lack of “Export to LocalFile”, “Query file as Table”