Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Intelligence at Google Scale: Vision/Sp...

Machine Intelligence at Google Scale: Vision/Speech API, TensorFlow and Cloud Machine Learning

The biggest challenge of Deep Learning technology is the scalability. As long as using single GPU server, you have to wait for hours or days to get the result of your work. This doesn't scale for production service, so you need a Distributed Training on the cloud eventually. Google has been building infrastructure for training the large scale neural network on the cloud for years, and now started to share the technology with external developers. In this session, we will introduce new pre-trained ML services such as Cloud Vision API and Speech API that works without any training. Also, we will look how TensorFlow and Cloud Machine Learning will accelerate custom model training for 10x - 40x with Google's distributed training infrastructure.

現在のディープラーニング技術の最大の課題はスケーラビリティです。1台のGPUサーバを使っている限り、学習結果が得られるまで数時間や数日待つ必要があります。プロダクションでの利用には分散学習の導入が不可欠です。Googleは過去数年に渡ってクラウド上での大規模なニューラルネットワーク学習のためのインフラを構築し、その成果を外部の開発者に提供し始めました。このセッションでは、Cloud Vision APIやSpeech APIなど、学習せずにすぐに使えるMLサービスを紹介します。また、TensorFlowやCloud Machine Learningなど、Googleの分散学習インフラにより10〜40倍の高速学習を実現するサービスについて解説します。

Kazunori Sato

April 04, 2016

More Decks by Kazunori Sato

Other Decks in Programming


  1. +Kazunori Sato @kazunori_279 Kaz Sato Staff Developer Advocate Tech Lead

    for Data & Analytics Cloud Platform, Google Inc.
  2. What we’ll cover Deep learning and distributed training Large scale

    neural network on Google Cloud Cloud Vision API and Speech API TensorFlow and Cloud Machine Learning
  3. DNN = a large matrix ops a few GPUs >>

    CPU (but it still takes days to train) a supercomputer >> a few GPUs (but you don't have a supercomputer) You need Distributed Training on the cloud
  4. Jupiter network 10 GbE x 100 K = 1 Pbps

    Consolidates servers with microsec latency
  5. Borg No VMs, pure containers 10K - 20K nodes per

    Cell DC-scale job scheduling CPUs, mem, disks and IO
  6. What's the scalability of Google Brain? "Large Scale Distributed Systems

    for Training Neural Networks", NIPS 2015 ◦ Inception / ImageNet: 40x with 50 GPUs ◦ RankBrain: 300x with 500 nodes
  7. Pre-trained models. No ML skill required REST API: receives images

    and returns a JSON $2.5 or $5 / 1,000 units (free to try) Public Beta - cloud.google.com/vision Cloud Vision API
  8. Pre-trained models. No ML skill required REST API: receives audio

    and returns texts Supports 80+ languages Streaming or non-streaming Limited Preview - cloud.google.com/speech Cloud Speech API
  9. Google's open source library for machine intelligence tensorflow.org launched in

    Nov 2015 The second generation Used by many production ML projects What is TensorFlow?
  10. What is TensorFlow? Tensor: N-dimensional array Flow: data flow computation

    framework (like MapReduce) For Machine Learning and Deep Learning Or any HPC (High Performance Computing) applications
  11. # define the network import tensorflow as tf x =

    tf.placeholder(tf.float32, [None, 784]) W = tf.Variable(tf.zeros([784, 10])) b = tf.Variable(tf.zeros([10])) y = tf.nn.softmax(tf.matmul(x, W) + b) # define a training step y_ = tf.placeholder(tf.float32, [None, 10]) xent = -tf.reduce_sum(y_*tf.log(y)) step = tf.train.GradientDescentOptimizer(0.01).minimize (xent)
  12. # initialize session init = tf.initialize_all_variables() sess = tf.Session() sess.run(init)

    # training for i in range(1000): batch_xs, batch_ys = mnist.train.next_batch(100) sess.run(step, feed_dict={x: batch_xs, y_: batch_ys})
  13. Portable • Training on: ◦ Data Center ◦ CPUs, GPUs

    and etc • Running on: ◦ Mobile phones ◦ IoT devices
  14. Fully managed, distributed training and prediction for custom TensorFlow graph

    Supports Regression and Classification initially Integrated with Cloud Dataflow and Cloud Datalab Limited Preview - cloud.google.com/ml Cloud Machine Learning (Cloud ML)
  15. • CPU/GPU scheduling • Communications ◦ Local, RPC, RDMA ◦

    32/16/8 bit quantization • Cost-based optimization • Fault tolerance Distributed Training with TensorFlow
  16. Data Parallelism = split data, share model (but ordinary network

    is 1,000x slower than GPU and doesn't scale)
  17. Jeff Dean's keynote: YouTube video Define a custom TensorFlow graph

    Training at local: 8.3 hours w/ 1 node Training at cloud: 32 min w/ 20 nodes (15x faster) Prediction at cloud at 300 reqs / sec Cloud ML demo
  18. Ready to use Machine Learning models Use your own data

    to train models Cloud Vision API Cloud Speech API Cloud Translate API Cloud Machine Learning Develop - Model - Test Google BigQuery Stay Tuned…. Cloud Storage Cloud Datalab NEW Alpha GA Beta GA Alpha Beta GA
  19. Links & Resources Large Scale Distributed Systems for Training Neural

    Networks, Jeff Dean and Oriol Vinals Cloud Vision API: cloud.google.com/vision Cloud Speech API: cloud.google.com/speech TensorFlow: tensorflow.org Cloud Machine Learning: cloud.google.com/ml Cloud Machine Learning: demo video