Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Leverage our skills and apps with new AI/ML too...

Julien Salvi
September 06, 2024

Leverage our skills and apps with new AI/ML tools for Android

2023 has been a pivotal year in term of AI introducing groundbreaking tools and libraries to apply AI/ML concepts to the Android ecosystem like Gemini, Studio Bot or MediaPipe. Not to forget what was already in place like ML Kit or TensorFlow.

This talk will introduce these new tools by showing how they can help you produce better code like Studio Bot or build AI-powered application with Gemini & co. Each tool has its own learning curve, usability or potential blockers. We'll deep dive into their capabilities and see how they can empower you to build great projects.

By the end of the session, this talk will help you to have a better picture on where to use what and in which situation. AI is here to amplify our capabilities so let's take advantage of it! 🚀

Julien Salvi

September 06, 2024
Tweet

More Decks by Julien Salvi

Other Decks in Programming

Transcript

  1. Leverage your skills & apps with new AI/ML tools for

    Android Julien Salvi - Android GDE | Android @ Aircall droidcon Lisbon 2024 󰐨 @JulienSalvi Let’s use AI/ML wisely 🤓
  2. The age of AI/ML A little bit of context… A

    little bit of context… It’s there, we cannot escape 🫣
  3. The Age of AI/ML AI context in 2024 • AI/ML

    is booming for 2 years now with many tools that are more and more accessible for everyone • We’ve seen the rise of Generative AI (ChatGPT, Gemini, MistralAI…) and a push in other ML tools (MediaPipe, TensorFlow, ML Kit…) • Android isn’t not escaping the AI trend! • We now have a large set of tools to leverage our skills and apps 🚀
  4. The Age of AI/ML AI/ML on Android • We can

    identify 2 categories of tools: ◦ AI for developers ◦ AI for apps • The first one will help developers in building great apps being a daily assistant to leverage their skills • The second will provide a set of tools to build better apps • Each tool has its own learning curve, cost and set of features 🚀
  5. AI/ML tools to leverage your skills A little bit of

    context… A little bit of context… Let AI assist you 🤖
  6. AI Assistants Gemini, Copilot, JetBrains AI… • AI assistants should

    be seen as pair programmers 🤓 Use them to boost productivity, explore new ideas, and learn new techniques. • DON’T blindly accept every suggestions! 🫣 • Security & Privacy matters a lot! 🔐 Check your company policy before using a new AI companion. Be mindful of the code you share with the AI assistants.
  7. AI chats The art of prompting • When using LLM

    based chats, you must provide the best context possible to get the best answers 📝 • You're not chatting with a human! 🤖 Be clear, specific, and avoid ambiguity. • Context is key: Provide relevant information about your code, goals, and desired outcome. ◦ Instead of: "Make this better" ◦ Try: "Simplify this Kotlin function and explain the changes"
  8. AI chats The art of prompting • Keywords are your

    best friends: Use relevant technical terms ("Jetpack Compose," "coroutines," "Room database") • Structure for success: ◦ State the task clearly: "Write a..." "Explain how..." "Find errors in..." ◦ Provide code snippets or context: (Use the "Insert code" button in Gemini) ◦ Specify desired format: "Kotlin code", "Bulleted list", "Slides"... • If the first response isn't perfect, rephrase or refine your prompt!
  9. AI chats Gemini in Android Studio • Gemini is directly

    built in Android Studio 🛠 • It can answer coding question, generate code or help you debug some part of your code 🤓 • You can control the data/code shared with Gemini 🧐 • Fine-grained control of the files your share with Gemini using a .aiexclude file 🔐
  10. AI chats Gemini in Android Studio • Get quick answers:

    Ask about Android APIs, libraries, best practices, or even general coding concepts. ◦ "How do I use Room to store data?" ◦ "How can I make my app more accessible?" ◦ "What's the difference between ViewModel and SavedStateHandle?"
  11. AI chats Gemini in Android Studio • Generate different code

    options: Describe the functionality you need, and Gemini will suggest code snippets. ◦ "Create a function to fetch data from this API endpoint using Retrofit” ◦ "Write a composable function that displays a list of items in a lazy grid"
  12. AI chats Gemini in Android Studio • Improve existing code:

    Ask Gemini to review your code for potential issues, optimizations, or improvements. ◦ “Can you help me simplify this code?” ◦ “Is there a more efficient way to implement this feature?”
  13. Main Usage: Offer suggestions and automate repetitive tasks Cost: $0*

    Learning Curve: Fast Pros Cons • Integrated within Android Studio • Trained for Android dev • No additional cost • Privacy controls • Still in development • No offline support • Privacy concerns *depending on how you value your code 😅 in Android Studio
  14. Main Usage: Offer suggestions and automate repetitive tasks Cost: From

    $10/month to $39/seat per month Learning Curve: Fast Pros Cons • Official Plugin for Android Studio • Easy connect with your GitHub account • Lots of features if fully integrated with GitHub • Privacy controls • Context awareness • Non negligible cost • No offline support • Privacy concerns GitHub Copilot in Android Studio
  15. Main Usage: Offer suggestions and automate repetitive tasks in IntelliJ

    Cost: €8.33/per month Learning Curve: Fast Pros Cons • Plugin for IntelliJ • Efficient code completion and generation • Can generate commit msg, explain errors, generate documentation • Privacy controls • Customer data not used to train the models • Still in development • No offline support • Privacy concerns JetBrains AI ℹ JetBrains AI is using LLMs from OpenAI and Google
  16. AI/ML tools for your apps A little bit of context…

    A little bit of context… Build smarter & richer apps 󰳕
  17. In a nutshell • Gemini easily enables generative AI capabilities

    in your apps to build enhanced features like sentiment analysis, smart bots, text summary and more • ⚠ Only use the Google AI Client SDK for prototyping as you can leak your API key if it’s embedded in your app • Prefer using Gemini in Firebase with Vertex AI or having your own gateway for a safe usage for Android
  18. In a nutshell • Gemini on-device with Nano is still

    a private preview 🥲 • The more context (text and images) you give, the more accurate your response will be! • Experiment with the parameters to get the desired output for Android
  19. dependencies { // Google AI Client SDK for Android implementation

    'com.google.ai.client.generativeai:generativeai:0.9.0' // Vertex AI in Firebase implementation 'com.google.firebase:firebase-vertexai:16.0.0-beta04' } 🧪 prototyping
  20. // With Google AI SDK on Android val model =

    GenerativeModel( model = "gemini-1.5-flash", apiKey = "<MY-API-KEY>", generationConfig = generationConfig { temperature = 0.15f topK = 32 topP = 1f maxOutputTokens = 4096 }, safetySettings = listOf( SafetySetting(HarmCategory.HARASSMENT, BlockThreshold.MEDIUM_AND_ABOVE), SafetySetting(HarmCategory.HATE_SPEECH, BlockThreshold.MEDIUM_AND_ABOVE), SafetySetting(HarmCategory.SEXUALLY_EXPLICIT, BlockThreshold.MEDIUM_AND_ABOVE), SafetySetting(HarmCategory.DANGEROUS_CONTENT, BlockThreshold.MEDIUM_AND_ABOVE), ) )
  21. // With Vertex AI in Firebase val model = Firebase.vertexAI.generativeModel(

    model = "gemini-1.5-flash", generationConfig = generationConfig { temperature = 0.15f topK = 32 topP = 1f maxOutputTokens = 4096 }, safetySettings = listOf( SafetySetting(HarmCategory.HARASSMENT, BlockThreshold.MEDIUM_AND_ABOVE), SafetySetting(HarmCategory.HATE_SPEECH, BlockThreshold.MEDIUM_AND_ABOVE), SafetySetting(HarmCategory.SEXUALLY_EXPLICIT, BlockThreshold.MEDIUM_AND_ABOVE), SafetySetting(HarmCategory.DANGEROUS_CONTENT, BlockThreshold.MEDIUM_AND_ABOVE), ) )
  22. // Text generation with a simple prompt scope.launch { val

    response = model.generateContent("Give a recipe with the best Portuguese ingredients") } // Use an image and a prompt scope.launch { val response = model.generateContent( content { image(bitmap) text("Is there some carrot in this picture?") } ) } // Text generation as a stream thanks to Flow scope.launch { var outputContent = "" generativeModel.generateContentStream("My awesome prompt").collect { response -> outputContent += response.text } }
  23. for Android Main Usage: Enhance your apps with GenAI-based features

    Cost: from $0 to $21/1 million tokens (output)* Learning Curve: Fast Pros Cons • Fast integration with Android • Proxy with Firebase • Fast learning curve • Text and image as input • On-device capabilities with Gemeni Nano • Lots of things still in preview • Heavy process can be costly • High risk of leaking your API key if you embed the SDK in your app *depending on the Gemini model and/or Firebase cost
  24. In a nutshell ML Kit on Android • ML Kit

    brings powerful and easy-to-use ML features, optimized for Android and iOS with minimal coding and resource. • It provides pre-built and customizable models for common use cases such as image and text recognition, face detection, barcode scanning… • ML Kit also allows developers to train custom models using their own data.
  25. Model installation ML Kit on Android • Models in ML

    Kit APIs can be installed in 3 different ways: ◦ Unbundled: Models are downloaded and managed via Google Play Services. ◦ Bundled: Models are statically linked to your app at build time. ◦ Dynamically downloaded: Models are downloaded on demand. • By using ML Kit you will increase your app size (2 to 10 MB per model).
  26. Vision libraries ML Kit on Android Text Recognition v2 Face

    Detection Face Mesh Detection (beta) Object Detection Image Labeling Document Scanning (beta) Pose Detection (beta) Barcode Scanning Digital Ink Recognition (beta) Selfie and subject segmentation (beta)
  27. Text Recognition v2 ML Kit on Android • Text Recognition

    v2 allows us to extract text from images (camera or static images) • Trained to recognize text in over 100 languages, including Latin-based scripts and non-Latin scripts such as Japanese or Chinese. • The Text Recognizer segments text into blocks, lines, elements and symbols. Bundled model: +4 MB per architecture
  28. Text Recognition v2 ML Kit on Android • Text Recognition

    v2 allows us to extract text from images (camera or static images) • Trained to recognize text in over 100 languages, including Latin-based scripts and non-Latin scripts such as Japanese or Chinese. • The Text Recognizer segments text into blocks, lines, elements and symbols. Line Block Element Line Element Element
  29. dependencies { // To recognize Latin script implementation 'com.google.mlkit:text-recognition:16.0.0' //

    To recognize Chinese script implementation 'com.google.mlkit:text-recognition-chinese:16.0.0' // To recognize Devanagari script implementation 'com.google.mlkit:text-recognition-devanagari:16.0.0' // To recognize Japanese script implementation 'com.google.mlkit:text-recognition-japanese:16.0.0' // To recognize Korean script implementation 'com.google.mlkit:text-recognition-korean:16.0.0' }
  30. // Init TextRecognition client (here latin languages) val recognizer =

    TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS) // Load bitmpap image for instance val image = InputImage.fromBitmap(bitmap, 0) // Use the client to process the image val result = recognizer.process(image) .addOnSuccessListener { visionText -> // Get text from image and info where they are located val allText = visionText.text val blocks = visionText.textBlocks // ... } .addOnFailureListener { e -> // Task failed with an exception }
  31. Kit for Android Main Usage: Build computer vision and NLP

    features with pre-built models Cost: $0 Learning Curve: Quite Fast Pros Cons • Fast integration with Android and on-device • Free to use • Pre-built models for various use cases • Custom model deployment • Optimized for mobile usage • Black box with pre-built models • Limited model customization • Some features require the Google Play Services • Performance for Computer Vision
  32. In a nutshell TensorFlow Lite or LiteRT • LiteRT (formerly

    TensorFlow Lite) is a high-performance cross-platform runtime for on-device AI • Convert or use existing model that suit your use cases or build your own! • LiteRT is optimized for mobile with a focus on privacy, size and perf • The learning curve of building your own TFLite models can be quite steep and can require Python knowledge https://ai.google.dev/edge/litert
  33. In a nutshell TensorFlow Lite or LiteRT • You can

    take advantage of the Play Service to have a lighter app and use high-level API in Java/Kotlin (recommended way) • The high-level API will be there in run the inferences through an Interpreter API expose in library. • You will have control on the input, output and learning part • Otherwise, you’ll have to deal a C/C++ API! https://ai.google.dev/edge/litert •
  34. LiteRT / TensorFlow Lite Main Usage: Build and deploy ML

    models on Android to bring on-device ML features Cost: $0* Learning Curve: High Pros Cons • Build your own models or use existing ones • Optimized for on-device ML • Offline support • Low latency and real time performances • Full control of the flow • Steepest learning curve for Android Dev • Python knowledge mandatory • Model conversation to .tflite format • Requires strong ML knowledge *building and hosting your models can be non negligible
  35. In a nutshell MediaPipe Framework • MediaPipe Framework is a

    low-level tool to build on-device ML pipelines • It requires NDK/C++ to run the pipelines on Android and be familiar with several Framework concepts (Packets, Graph, Calculator) • The learning curve is steep and it can take some time to master the entire flow https://ai.google.dev/edge/mediapipe/framework
  36. In a nutshell MediaPipe Solutions • MediaPipe brings a cross-platform

    and easy-to-use ML solution, optimized for mobile with minimal coding and resource. • It provides pre-built models for multiple fields such as vision, text, audio or GenAI… or you can build and evaluate your own models with the Model Maker & Studio tools • You must add the models in the app resources before using the MediaPipe libraries https://ai.google.dev/edge/mediapipe/solutions/guide
  37. MediaPipe Solutions Gesture recognition Image Classification Face stylization Object detection

    Audio classification Interactive segmentation Vision Hand detection Face detection Experimental Text Text classification Text embedded Language identification Audio GenAI Image generation LLM inference Experimental
  38. Hand landmarks detection MediaPipe Solution • Hand landmarks detection identify

    the key points of the hand • The input can a static image, a decoded video frame or a live stream • The library offers many configurable options • ⚠ Embedding models in your app will have an impact on your app size
  39. dependencies { // To recognize hand landmarks implementation 'com.google.mediapipe:tasks-vision:0.10.15' }

    // Download the pre-built model // Add it to your app assets <your-project-root>/src/main/assets
  40. // Path the model to the library val baseOptions =

    BaseOptions.builder().setModelAssetPath("path_to_model").build() // Configure the hand landmarks detection here val optionsBuilder = HandLandmarker.HandLandmarkerOptions.builder() .setBaseOptions(baseOptions) .setMinHandDetectionConfidence(minHandDetectionConfidence) .setMinTrackingConfidence(minHandTrackingConfidence) .setMinHandPresenceConfidence(minHandPresenceConfidence) .setNumHands(maxNumHands) .setRunningMode(RunningMode.IMAGE) // Start detecting! val handLandmarker = HandLandmarker.createFromOptions(context, optionsBuilder.build()) val mediaPipeImage = BitmapImageBuilder(image).build() // image is Bitmap val result = handLandmarker?.detect(mediaPipeImage)
  41. // Path the model to the library val baseOptions =

    BaseOptions.builder().setModelAssetPath("path_to_model").build() // Configure the hand landmarks detection here val optionsBuilder = HandLandmarker.HandLandmarkerOptions.builder() .setBaseOptions(baseOptions) .setMinHandDetectionConfidence(minHandDetectionConfidence) .setMinTrackingConfidence(minHandTrackingConfidence) .setMinHandPresenceConfidence(minHandPresenceConfidence) .setNumHands(maxNumHands) .setRunningMode(RunningMode.IMAGE) // Start detecting! val handLandmarker = HandLandmarker.createFromOptions(context, optionsBuilder.build()) val mediaPipeImage = BitmapImageBuilder(image).build() // image is Bitmap val result = handLandmarker?.detect(mediaPipeImage)
  42. MediaPipe (Solutions & Framework) Main Usage: Build ML-based features (vision,

    text, audio) with turnkey models or your own Cost: $0* Learning Curve: Medium high Pros Cons • Built-in models for MediaPipe Tasks • Lots of use cases covered • On-device ML • Customize your own models • Cross-platform • Steeper learning curve • MediaPipe Framework requires NDK knowledge • Python recommended to build ML models *building and hosting your models can be non negligible
  43. AI/ML tools ML Kit https://developers.google.com/ml-kit MediaPipe https://ai.google.dev/edge/mediapipe/solutions/guide Gemini on Android

    https://developer.android.com/ai/generativeai ML/AI Codelabs https://codelabs.developers.google.com/?category=aiandmachinelearning&product=android