Joined LINE corporation as a Data Scientist >Responsible for data analysis and development of service for corporations. > Takahiro Yoshinaga, Ph.D. (Science)
LINE corporation - Web application in a notebook format - Create / Execute query - Visualize and Share easily - Have access to R, Python (SparkR, Pyspark) - Extension - Create UDF to make data analysis convenient - Add stand-alone JAR in spark-submit options
Normal - Write Scala code - Build on local machine - sbt test - Upload JAR to HDFS - Check it on OASIS - Review & Merge on Github - Re-build and versioning - Upload to OASIS - Write Scala code - <deleted> - <deleted> - <deleted> - git push & check it on OASIS - Review & Merge on Github - <deleted> - <deleted>
and Automate Group by - Before : Ad hoc aggregation by requirements - After : Auto calculation by frequently used metrics Mapping - Before : long long case when … - After : only one UDF
CI / CD in our development. LINE has an environment that data scientist can develop in a modern way. We realize high performance in data analysis thanks to our development.