Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

[Chaiyo GCP - Online Code Along Session] - Gemi...

Punsiri Boonyakiat
December 07, 2024
2

[Chaiyo GCP - Online Code Along Session] - Gemini for Data Scientist [Running Codelab]

The lab session at the Chaiyo GCP Event in Thailand focuses on using Gemini in BigQuery as an assistant to build a K-means clustering model. As a speaker, I guided participants through segmenting customers based on their order behaviors by writing prompts for Data Scientists’s workflow.

Punsiri Boonyakiat

December 07, 2024
Tweet

Transcript

  1. Code Along session - members Kamolphan Liwprasert (Fon) Cloud GDE

    and WTM Ambassador Burasakorn Sabyeying (Mils) Cloud GDE and WTM Ambassador Kasin Sumalyaporn (Up) Customer Solutions Consultant @ Google Punsiri Boonyakiat (Beat) Senior Data Engineer @ CJ Express
  2. Google I/O Extended Bangkok - June 22 Google I/O Cloud

    Extended Bangkok - Soon! Facebook Page: GDG Bangkok Facebook Group: GDG Cloud Bangkok
  3. Burasakorn Sabyeying (Mils) Senior Data Engineer @ CJ Express, Google

    Developer Experts in Cloud goo.gle/ChaiyoGCPS4
  4. Self-paced learning. Learn in your own time and schedule Gamification.

    Get the knowledge and score the swags! From community to community. Learning together with the community ChaiyoGCP: Online upskilling campaign for the developer communities to connect and learn about Google's latest technology. 1 2 3 goo.gle/ChaiyoGCPS4
  5. March Follow up Wrap up Code Along session Start Event

    Recording in Youtube: ‘GDG Cloud Bangkok’ channel Social Sharing Hashtag #ChaiyoGCP Welcome Email: “[#ChaiyoGCP] ยินดี ต้อนรับสู่ #ChaiyoGCP! เริ่ม ต้นการผจญภัยแห่งการเรียนรู้ ของคุณได้แล้วตอนนี้!” Redeem 1 month access on Cloud Skill Boosts Quest: Generative AI Explorer - Vertex AI Code Along sessions (Online event) Completion Form Swags (6-7 weeks after result email) Make your profile public Google I/O Extended ( June 22 ) Result email will be sent within 1 week May 17 March April May 28 June 15 Stay Tuned!! ! !! Quest: Gemini for Data Scientists and Analysts Join Discord Community *Only available for developers and participants who live in Thailand.
  6. Tracks Quest/Badge (get badge here!) Lab vs Quest vs Badge

    Lab AI/ML Infrastructure & Security goo.gle/ChaiyoGCPS4 Quest/Badge Lab Lab Lab Lab Lab Lab Quest/Badge Lab Lab Lab Lab Quest/Badge Lab Lab Lab Lab Quest/Badge Lab Lab Lab Lab Lab Quest/Badge … Quest/Badge …
  7. FAQ Q: Where is the Referral Code and How is

    the Referral Criteria? A: You will get it in the welcome email. You can use that code to invite your friends to the ChaiyoGCP program and ask them to submit the Completion form. Success Referral Criteria: • Your friends should put your referral code on register form or on their completion form (case sensitive) • Your friends should complete at least 7 quests • We will be selecting the top 3 ChaiyoGCP participants with the most successful referral to win a secret special prize . • The winner will be contacted through email two weeks after the program ends.
  8. FAQ Q: Can I mix the quests between two tracks

    (AI/ML and Infra+security)? A: Yes, you can! Q: If I have done the quest before, how to count complete missions? A: Only quests completed by 17 May - 15 June 2024 will be considered in the redemption process.
  9. FAQ Q: Can I use the same email I used

    in last year ? A: Yes, you can. But the quest that already completed in last year won’t be count in this year.
  10. Reason participant not getting swag • You are not based

    in Thailand • You completed fewer than the minimum of 7 required quests (inc. at least 3 skill badges) (List of quests) • You did not submit your Google Cloud Skills Boost Profile public profile link • You submitted an invalid Google Cloud Skills Boost Profile public profile link • Your completion form submission passed the deadline
  11. Kickoff Event (Build with AI day) How to redeem Cloud

    Skill boost - 1 month access & Get 9 credits 8 credits Monthly subscription Any 1 lab in this quest ** Open link in incognito mode** ** and ensure to sign out Cloud Skill Boost before click link** Join & Sign in
  12. Distributed Memory Shuffle Tier BigQuery Replicated, Distributed Storage (99.9999999999%) High-Available

    Cluster Compute (Dremel) BigQuery’s serverless architecture enables sophisticated compute capabilities Decoupled storage and compute Fully managed and serverless for maximum agility and scale Gigabyte to petabyte scale storage and SQL queries Built-in ML for out-of-the-box predictive insights
  13. Gemini in BigQuery Always-on Intelligence spanning the Data & Analytics

    journey Explore, Discover & Analyze Engineer & Transform Optimize Workloads Secure Data Fundamentals
  14. Gemini in BigQuery Engineer & Transform Data preparation Preview Explore,

    Discover & Analyze Data Canvas Preview Semantic metadata search Preview Data insights Preview Fundamentals SQL code assist Preview Python code assist Preview Accelerated migrations Preview Secure Data Assisted data masking Roadmap Privilege anomaly detection Roadmap Sensitive data monitoring Roadmap Optimize Workloads Partitioning & clustering recommendations Roadmap Materialized views recommendations Roadmap Autotuning for Spark Preview Assisted Spark troubleshooting Preview
  15. Issues: Streaming/batch data Data processing Export data Manage resources? Data

    Governance? Deploy ML model Train ML model (e.g. Python/R) Where do I host? Multiple products & roles can lead to unnecessary complexity & costs BigQuery (data warehouse) Typical ML workflow poses many challenges
  16. Issues: Streaming/batch data Data processing Export data Manage resources? Data

    Governance? Deploy ML model Train ML model (e.g. Python/R) Where do I host? Simplify with BigQuery ML BigQuery (data warehouse) BigQuery ML greatly simplifies the ML workflow
  17. CREATE OR REPLACE MODEL `bqml.penguins_model` OPTIONS (model_type='linear_reg', input_label_cols=['body_mass_g']) AS SELECT

    * FROM `public-data.ml_datasets.penguins` WHERE body_mass_g IS NOT NULL Train and deploy ML models in SQL Execute ML workflows without moving data from BigQuery Built-in infrastructure management, security & compliance ML with just a few SQL statements SELECT * FROM ML.PREDICT ( MODEL`bqml_tutorial.penguins_model`, (SELECT * FROM`public-data.ml_datasets.penguins` WHERE body_mass_g IS NOT NULL AND island = "Biscoe")) Train Predict 1 2
  18. Enterprise benefits of BigQuery ML Cost & Complexity BigQuery’s fully

    managed and serverless infrastructure significantly reduces cost and complexity Security IAM roles and fine-grained access controls for all users on one platform People Business/Data Analysts can build and deploy powerful ML models using SQL
  19. Lab Objectives ใน Lab วันนี้เราจะไดลงมือตาม Task ดังตอไปนี้ • ใช Gemini

    ถามตอบคําถาม เพื่อทําความเขาใจโซลูชันและบริการตางๆ บน Google Cloud และการนําไปใช • ใช Gemini อธิบาย SQL query ที่มีอยูแลว และเขียน SQL query ขึ้นใหมโดยใช Gemini เปนตัวชวย • สรางและเทรนโมเดล Machine Learning โดยใช ARIMA PLUS (time series forecast) เพื่อทํานายยอด ขายในอนาคต ผานการเขียน SQL บน BigQuery (BigQuery ML)
  20. Task 1. Configure your environment and account • Enable the

    Cloud AI Companion API for Gemini • Grant the necessary IAM roles to your Student Account and Quicklab Service Account Reference: https://console.cloud.google.com/apis/library/cloudaicompanion.googleapis.com gcloud services enable cloudaicompanion.googleapis.com --project ${PROJECT_ID} gcloud projects add-iam-policy-binding ${PROJECT_ID} --member user:${USER} --role=roles/cloudaicompanion.user gcloud projects add-iam-policy-binding ${PROJECT_ID} --member user:${USER} --role=roles/serviceusage.serviceUsageViewer
  21. Task 5. Generate a SQL query that groups sales by

    day and product # select the sum of sale_price by Date(created_at) and product_id casted to day from bigquery-public-data.thelook_ecommerce.order_id as t1 joined this with products table in the same dataset as t2 SELECT SUM(sale_price), DATE(created_at) AS created_at_day, CAST(product_id as INT64) FROM `bigquery-public-data.thelook_ecommerce.order_items` AS t1 JOIN `bigquery-public-data.thelook_ecommerce.products` AS t2 ON t1.product_id = t2.id GROUP BY created_at_day, product_id
  22. Task 6. Build a forecasting model and view results CREATE

    MODEL bqml_tutorial.sales_forecasting_model OPTIONS(MODEL_TYPE='ARIMA_PLUS', time_series_timestamp_col='date_col', time_series_data_col='total_sales', time_series_id_col='product_id') AS SELECT sum(sale_price) as total_sales, DATE(created_at) as date_col, product_id FROM `bigquery-public-data.thelook_ecommerce.order_items` AS t1 INNER JOIN `bigquery-public-data.thelook_ecommerce.products` AS t2 ON t1.product_id = t2.id GROUP BY 2, 3;
  23. Bring AI to your data with the simplicity and scale

    of BigQuery BigQuery ML: Democratize machine learning Data to AI Inference Engine for remotely hosted models and Google’s pretrained models. Register your BigQuery ML models in the Vertex AI Model Registry Modeling capabilities ARIMA+, Explainable AI, Advanced Feature Engineering, Holiday modeling Model Ops Use Colab notebooks to perform ML workflows in BigQuery. Import tensorflow models for batch prediction or export models from BigQuery ML for online prediction Predict sales figures Predict stock prices Identify spam emails Identify tumor types Product recommendation Create personalized content Analysis of written text Analysis of DNA data Detect fraud Predict credit risk Perform customer segmentation Predict housing prices based on historical data Autoencoder K-Means ARIMA-PLUS PCA Matrix Factorization Wide & deep classifier DNN Classifier AutoML table classifier Boosted trees classifier Logistic regression Wide & deep regressor DNN regressor AutoML Table regressor Boosted Trees regressor Linear regression Predict values Predict between categories Generate recommendations Reduce data dimensionality Find anomalies Group data into clusters Time series forecasting What do you want to do? Legend Start Tasks Models Examples
  24. Visual & AI driven data discovery, modeling, and analysis Gemini

    in BigQuery BigQuery Data Canvas GenAI-centric experience for data exploration and visualization Iterative and guided user experience Embedded inside BigQuery Studio Semantic data discovery supported by Dataplex catalog Automated python notebook generation Built-in collaboration for data analysts
  25. Lab Objectives - Gemini for Data Scientists ใน Lab วันนี้เราจะไดลงมือตาม

    Task ดังตอไปนี้ • ใชงาน Colab Enterprise Python notebooks ที่อยูใน BigQuery Studio. • ใชงาน BigQuery DataFrames inside of BigQuery Studio. • ใชงาน Gemini Code Assistance เพื่อชวยในการ generate code ในการสราง Model และวิเคราะห ขอมูล • สราง K-means clustering model เพื่อชวยในการจัดกลุมของขอมูล • ใช Vertex AI - text-bison model เพื่อชวยวิเคราะหกลุมของขอมูลลูกคา E-commerce เพื่อใชงานการ สราง marketing campaign
  26. Task 1. Configure your environment and account • Enable the

    Cloud AI Companion API for Gemini • Grant the necessary IAM roles to your Student Account and Quicklab Service Account Reference: https://console.cloud.google.com/apis/library/cloudaicompanion.googleapis.com
  27. Task 1. Configure your environment and account c • Enable

    the Cloud AI Companion API for Gemini • Grant the necessary IAM roles to your Student Account and Quicklab Service Account
  28. Task 5. Build the Python Notebook • Import Python libraries

    • Define variables • Create and import a base table as a BigQuery DataFrame from a public dataset • Generate the K-means clustering model and visualization
  29. Task 6. Generate insights from the results of the model

    Example: Cluster 1: Title: "The Occasional Shoppers" Persona: These customers are likely to be sporadic shoppers who make infrequent purchases. They may be attracted to discounts or promotions, and they may be more likely to purchase items that are on sale. Next marketing step: Offer discounts or promotions to entice these customers to make more frequent purchases.
  30. Congratulation! ใน Lab วันนี้เราไดลงมือทําไปดวยกัน!! ✌ • ✅ ใชงาน Colab Enterprise

    Python notebooks ที่อยูใน BigQuery Studio. • ✅ ใชงาน BigQuery DataFrames inside of BigQuery Studio. • ✅ ใชงาน Gemini Code Assistance เพื่อชวยในการ generate code ในการ สราง Model และวิเคราะหขอมูล • ✅ สราง K-means model จาก BigFrames ML เพื่อชวยในการจัดกลุมของ ขอมูล • ✅ ใช Vertex AI - text-bison model เพื่อชวยวิเคราะหกลุมของขอมูลลูกคา E-commerce เพื่อใชงานการสราง marketing campaign