Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Elix Discovery™ Case Study: Rediscovering Do...

Elix
June 29, 2022

An Elix Discovery™ Case Study: Rediscovering Donepezil with an In-house Generative Model

Elix

June 29, 2022
Tweet

More Decks by Elix

Other Decks in Technology

Transcript

  1. 2 Goals • Provide a use case of Elix DiscoveryTM

    Platform • Focus on the generative models • Focus on novel scaffolds: ◦ Rediscover a scaffold of a known drug, but novel in terms of data used.
  2. 3 Target Molecule Selection Criteria Problem design: • Identify a

    known drug (“target molecule”) with detailed description of the discovery process • Collect publicly available data on the protein target of the target molecule • Filter out the training set to exclude molecules similar to the target molecule • Train predictive and generative models • Observe if scaffold rediscovery is successful Target Molecule Selection Criteria: • NOT a kinase inhibitor • Well-documented drug design process • Diverse dataset (not focused on single moiety derivative compounds)
  3. 4 Study Workflow Dataset Filtering • Exclusion of donepezil scaffold

    containing compounds from pre-training set • Exclusion of donepezil scaffold and relevant molecules from training set Dataset Curation • Pre-training set: ◦ ChEMBL data ◦ Objective: Learn SMILES vocabulary • Training-set: ◦ AChE inhibitors from ChEMBL database Predictive Model Training • Single model for activity prediction in generation step • 10 model ensemble for activity prediction in post-processing • BBB Permeability model ensemble of 5 models: ◦ Trained on a curated dataset of 9059 samples Generative model Training • In house developed SmilesFormer model • Pre-trained on the cleaned ChEMBL data • Fine-tuned on the cleaned activity data Generation and data analysis • Generate 30K molecules/run • Phys-chem filters • MCF filters • Novelty filters • BBB permeability prediction confidence filter • Activity prediction confidence filter • Scaffold grouping and rankings
  4. 5 Donepezil (Aricept) • Used for Alzheimer’s disease treatment •

    Centrally acting reversible acetylcholinesterase (AChE) inhibitor Physostigmine Galantamine Tacrine Donepezil Rivastigmine Compound 8 (Backbone) Donepezil Compound 1 (Seed) N-Benzylpiperazine 1-indanone N-Benzylpiperadine
  5. 8 Training set: Most abundant scaffolds 3572 807 787 615

    524 120 120 120 117 112 265 220 207 156 150 Number of molecules containing the structure (Legend) Extract unique scaffolds from training set Search training set for substructure matches to each scaffold
  6. 9 Training set: 10 most similar molecules to donepezil Legend:

    Tanimoto similarity to Donepezil 0.443 0.443 0.441 0.435 0.433 0.432 0.431 0.429 0.425 0.424
  7. 10 Generation Procedure Multiobjective Optimization Problem: • SA score •

    QED score • Favorable physical-chemical properties • Novelty (distance from the training set) • Activity Generative Score: • Average of the normalized single scores (SA, QED, phys-chem, novelty, activity) was computed for each generated molecule • Molecules with the highest “generative score” were prioritized during generation process • Up to 30K molecules with highest scores were generated in each sampling run • 10 sampling runs were performed in total
  8. 11 Post-Processing Analysis Summary 1 2 3 4 5 6

    30K molecules each Run 1 Run 2 Run 5 Run 3 Run 4 Run 6 Run 7 Run 10 Run 8 Run 9 Top 20 scaffolds each 20 most frequent scaffolds 7 Run 1 Run 2 Run 5 Run 3 Run 4 Run 6 Run 7 Run 10 Run 8 Run 9 Combined scaffolds from all runs 6 7 7 Filtering Steps
  9. 12 Post-Processing Analysis 1) Phys-Chem & MCFs • Lipinski’s RO5

    • Allowed common atoms • Ring size (up to 8) • Medicinal chemistry filters (189 filters) 2) Novelty • Avoid building upon known scaffolds (tacrine and physostigmine). • Remove molecules with exact scaffold match to the training set • Remove molecules with > 0.5 tanimoto similarity score 3) BBB Permeability • Choose molecules based on BBB permeability prediction probability threshold • Value used: ◦ 0.99 4) Activity Prediction Confidence • Choose top n percent of the molecules based on pIC50 prediction confidence • Values used: ◦ 50% ◦ 40% ◦ 30% ◦ 20%
  10. 13 Grouping and Ranking Analysis 5) Scaffold Grouping & Ranking

    • Group molecules sharing the same scaffold • Rank scaffolds by a “desirability score”: ◦ (QED + pIC50)/2 6) Combine Multiple Runs • Combine top 20 scaffolds from each of 10 sampling runs 7) Most consistent suggestions • Rank final list by number of occurrences
  11. 19 Generated Results with Donepezil Scaffold Donepezil scaffold Generated molecule

    Donepezil Compound 14 from the original Donepezil paper[1] [1] Sugimoto H. et al. Jpn. J. Pharmacol. 89, 7 – 20 (2002)
  12. 20 Summary & Discussion [1] • Elix DiscoveryTM Platform was

    used to discover novel scaffolds (distant from the training set) • During 10 runs ~30K molecules were generated in each run • Molecules in each run were filtered to a short list of 20 scaffolds. • Donepezil scaffold consistently ranked amongst the top 20 scaffolds • Donepezil scaffold was represented by a molecule originally described as one of the intermediary molecules (Compound 14) that led to the donepezil discovery[1] [1] Sugimoto H. et al. Jpn. J. Pharmacol. 89, 7 – 20 (2002)
  13. 21 Summary & Discussion [2] • Observations: ◦ Diversity in

    scaffolds: many scaffolds were represented by very few molecules. ◦ Generated molecules were mostly predicted to be BBB permeable, without explicit optimization for this parameter. ◦ Activity prediction models struggled when predicting on a chemical space too distant from the training set ◦ Filtering by the prediction confidence helped to focus on molecules with more confidence in predicted IC50 values.