Efficient and Scalable Framework for Activity Prediction with kMol, Elix, CBI 2023

Eﬃcient and Scalable Framework for Activity Prediction with kMol Elix,
Inc. Jun Jin Choong Research Engineer October 25th 2023

Table of Contents • Introduction ◦ Why kMol? • Use
Cases: ◦ Federated Learning ◦ Usage beyond federated learning • Conclusion 2

Complexity: It is difficult to find the best model to
solve the problem 3 Introduction, Why kMol? In this work, we present you kMol, a machine learning library build for this very purpose. Speed: Drug Discovery these days demand for much faster and efficient computational methods Scalability: Scalability is a problem for many pharmaceutical companies.

4 Problem Statement Case 1: Data, Security and Privacy Concerns
• Due to data security and privacy reasons, data cannot be shared externally. • Some data are conﬁdential and cannot be shared Pharmaceutical companies would like to utilize state-of-the-art models but, • Training deep models requires a lot of data for better performance. • Not everyone has the ability to train lots of data on big machines

5 Problem Statement Case 2: Domain Expertise Utilizing deep learning
models are potentially complicated and requires expert knowledge. It is not easy to develop existing models from white papers. • Cost of implementation • Expert knowledge and scalability of models are constraints for most pharmaceutical companies.

6 kMol for Federated Learning kMoL is a machine learning
library for drug discovery and life sciences, with federated learning capabilities. It’s a scalable and highly customizable library with batteries included. kMol is an open-source machine learning library. It can be found at https://github.com/elix-tech/kmol kMol was developed in collaboration with researchers from Kyoto University. The main goal of kMol was to establish a federated learning framework. However, continual development of kMol evolved its capabilities beyond just federated learning

Case 2: Domain Expertise. kMol is developed by Elix actively
supported by a group of talented AI Researchers. Questions and answers can be directed towards our Github repository and further customizations can be provided by Elix’s consultation services. 7 Our Solution Case 1: kMol approaches security and privacy concerns by introducing Federated Learning capabilities. Model architectures compatible with kMol will have this capability enabled by default. Source codes are also available open-source and can be scrutinized. kMol is designed to be easily maintained and open for scrutiny.

8 kMol and Federated Learning

9 Preliminaries: Federated Learning Federated learning is an approach to
circumvent conventional method for training machine learning models by using a collective strategy. Ultimately, we are interested in the ﬁnal state of the model; a fully trained model with state-of-the-art performance. kMol’s Approach to Federated Learning in practice. Global Model - The Master node aggregating all training model weights across different distributed worker nodes. Local Model - Identical copies of global model, but trained on a different set of data For every epoch, the trained model is sent to the global node for aggregation. Data security is preserved

10 kMol kMol is a library meant to be run
on the command-line. Prerequisites - Some Linux command line knowledge is required Installation - Comes with batteries included (i.e. example conﬁguration scripts) - Installation is straightforward - Two lines to perform the installation - or run with Docker

11 Configuration in kMol (1) Configurations Sample configurations are available
in the /data directory Configurations are available for - Federated Learning (MILA) - ADME - AMES - Ligand-Protein Activity Prediction

12 Configuration in kMol (2) • Settings of kMol can
be shared between users easily. It is written in JSON. As of version 1.1.4, YAML is also supported. • Configuration covers: ◦ Model configuration ◦ Data configuration ◦ Featurization/Preprocessing • Configurations are also extensible, allowing one to import existing configuration and making minor changes only. The parent configuration file can be loaded and parameters can be override in the child configuration.

13 Running kMol kMol is simply launched with kmol <command>
<configuration_file> Additional commands can be found in documentation. kMol is capable of performing hyperparameter optimization and other related subtasks Evaluation can be performed on trained models. The checkpoints has to be configured in the configuration file. A fully trained model can be used to perform prediction as well

14 Federated Learning with kMol kMol can be executed in
a federated learning scenario by launching a server and multiple clients. The client-server model works by associating a shared configuration file between all members of the federated learning network nodes. The target localhost:8024 in this case is the aggregating server. Server: By default grpc_configuration can be left empty and it will perform federated learning on a local machine. Client 1, Client 2: Client configuration would have a similar setup. Example: 80-20 Tox21 Configuration Client 1 Client 2 Server

15 Federated Learning with kMol Example Two clients are would
start training and the aggregator (server) will wait for each client to complete the speciﬁed epochs and aggregate based on the choice of the aggregator Upon aggregation, checkpoints are shared to all clients.

16 Federated Learning with kMol - Transparency In cases where
concerns of sharing checkpoints is crucial, kMol supports upload of checkpoints to Box.

17 Beyond Federated Learning: Scalability and Extensibility of kMol

18 Recent Developments For the past few years, a lot
of development has went into making kMol better. We have thus far included the following features: - State-of-the-art Graph Models - State-of-the-art Activity Prediction of Protein-Ligand Architectures (Developed by Elix) - Distributed computation of kMol (compatible with Fugaku) - Visualization tools such as Integrated Gradients ClusterGCN Explainability with Integrated Gradients

19 Recent Developments More recently the following are to be
supported: - Activity prediction with 3D Information (i.e. from docking simulation results) - MSA Feature extraction from AlphaFold/OpenFold’s dataset MSA Features GPHDK... Protein sequence Compound Structure Docking structure Graph or 3D-Graph kMol Featurizer Model Token or bag-of-words or AF2 feature 3D Graph Descriptors Interaction descriptor Activity value Pipeline Integration with 3D Information

20 Conclusion • kMoL is a machine learning library for
drug discovery and life sciences, with federated learning capabilities. It’s a scalable and highly customizable library with batteries included. • It is actively being developed by Elix in collaboration with researchers from Kyoto University. • Lots of room for improvement, but kMol is mainly presented as a library for research purposes. Its federated learning capabilities are also suited for enterprise environment • Source code is open-source can any form of contributions are welcome

21 Questions?

株式会社Elix http://ja.elix-inc.com/ 2

Efficient and Scalable Framework for Activity P...

Efficient and Scalable Framework for Activity Prediction with kMol, Elix, CBI 2023

Elix

More Decks by Elix

Other Decks in Research

Featured

Transcript

Eﬃcient and Scalable Framework for Activity Prediction with kMol Elix,

Table of Contents • Introduction ◦ Why kMol? • Use

Complexity: It is diﬃcult to ﬁnd the best model to

4 Problem Statement Case 1: Data, Security and Privacy Concerns

5 Problem Statement Case 2: Domain Expertise Utilizing deep learning

6 kMol for Federated Learning kMoL is a machine learning

Case 2: Domain Expertise. kMol is developed by Elix actively

8 kMol and Federated Learning

9 Preliminaries: Federated Learning Federated learning is an approach to

10 kMol kMol is a library meant to be run

11 Configuration in kMol (1) Configurations Sample configurations are available

12 Conﬁguration in kMol (2) • Settings of kMol can

13 Running kMol kMol is simply launched with kmol <command>

14 Federated Learning with kMol kMol can be executed in

15 Federated Learning with kMol Example Two clients are would

16 Federated Learning with kMol - Transparency In cases where

17 Beyond Federated Learning: Scalability and Extensibility of kMol

18 Recent Developments For the past few years, a lot

19 Recent Developments More recently the following are to be

20 Conclusion • kMoL is a machine learning library for

21 Questions?

株式会社Elix http://ja.elix-inc.com/ 2