has an out-of the box solu6on to parallelize this. grid_search = GridSearchCV(model, parameters, n_jobs=4) But limited to the one computer! Can we run this on mul5ple computers? 14
• Non-trivial task for a data scien:st to manage • How to start on demand and shutdown when unused Is it possible to have a simple interface that a data scien4st can manage on his/her own? 17
cloud (or starts a new one) • Runs a docker container with appropriate image • Exposes the required ports, setup a URL endpoint to access it • Manages shared disk across all the jobs 20
on the top of our compute pla5orm. pool = DistributedPool(n=5) results = pool.map(square, range(100)) pool.close() Starts 5 distributed jobs to share the work. 23
distributed_scikit import GridSearchCV grid_search = GridSearchCV( GradientBoostingRegressor(), parameters, n_jobs=16) A distributed pool with n_jobs will be created to distribute the tasks. 24