Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Hypothesis-driven Virtual Screening Pi...

Building Hypothesis-driven Virtual Screening Pipelines for Millions of Molecules at ODSC West 2017

Talk on virtual screening presented at ODSC West 2017 by Sebastian Raschka

Sebastian Raschka

November 03, 2017
Tweet

More Decks by Sebastian Raschka

Other Decks in Science

Transcript

  1. Sebastian Raschka Building Hypothesis-driven Virtual Screening Pipelines for Millions of

    Molecules PSA Lab ODSC West, San Francisco November 3, 2017 sebastianraschka.com @rasbt
  2. 3 https://en.wikipedia.org/wiki/Great_Lakes#/media/File:Great_Lakes_from_space_crop_labeled.jpg By the beginning of the twentieth century, the

    Great Lakes were the richest freshwater fishery in the world […] But those good years were soon gone. Dennis, Jerry. The Living Great Lakes: Searching for the Heart of the Inland Seas. Macmillan, 2003.
  3. 4 https://en.wikipedia.org/wiki/Welland_Canal#/media/File:Welland_Canal_aerial.png The opening of the Welland Canal […] allowed

    ships from all over the world to come to the upper lakes […] But nobody could have foreseen that the canal would also allow entry to a most unwelcome visitor, the sea lamprey. Dennis, Jerry. The Living Great Lakes: Searching for the Heart of the Inland Seas. Macmillan, 2003.
  4. 6

  5. 8

  6. 9

  7. 10

  8. 11

  9. 13 (PDB ID 2RH1) Beta blockers Rose scent Pheromone Light

    photons Smoke Rotten fish Adrenaline
  10. 15

  11. 22 sulfate oxygens 3-keto sulfate group 12-hydroxy 7-hydroxy 18-methyl 19-methyl

    sulfate ester sulfur carbon tail steroid substructure
  12. 25 @<TRIPOS>MOLECULE DCM Pose 1 32 33 0 0 0

    SMALL USER_CHARGES @<TRIPOS>ATOM 1 C1 18.8934 5.5819 24.1747 C.2 1 <0> -0.1356 2 C2 18.1301 4.7642 24.8969 C.2 1 <0> -0.0410 3 C3 18.2645 6.8544 23.7342 C.2 1 <0> 0.4856 4 C4 16.2520 6.2866 24.7933 C.2 1 <0> 0.8410 5 C5 15.3820 3.0682 25.1622 C.3 1 <0> 0.0000 …
  13. 26

  14. 28

  15. 29

  16. 33 Atom Type Counts General Properties Functional Group Distanc Applying

    a Customizable Molecule Filter Conformer Sampling Volumetric & Chemical Overlays Functional Group Ma Database Annotation + + Hypothesis-based Filtering 1 2 Conformer Overlays and Pharmacophore Matching 12 363 423 g/mol 423 g/mol ID Weight Purchasable O N O S N O O O O Atom Type Counts General Properties Functional Group Distances Applying a Customizable Molecule Filter Conformer Sampling Volumetric & Chemical Overlays Functional Group Matching Database Annotation + + Hypothesis-based Filtering 1 2 Conformer Overlays and Pharmacophore Matching 12 363 423 g/mol 423 g/mol ID Weight Purchasable O N O S N O O O O General Properties Atom Type Counts Functional Group Distances Hypothesis-based Filtering
  17. 35 12 363 ID 3-keto 12-hydroxy Domain Knowledge PAINS Docking

    Scores Price Purity Chemical Scaffold Overall Similarity Thresholds Functional Group Matching Patterns Additional Selection Criteria Selection for Experimental Assays
  18. 37

  19. 38 UETGGPNCOR Enabling the hypothesis-driven prioritization of ligand candidates in

    big databases: Screenlamp and its application to GPCR inhibitor discovery for invasive species control (2017). Raschka S., A. M. Scott, N. Liu. S. Gunturu, M. Huertas, W. Li, and L. A. Kuhn JCAM (manuscript under revision) https://psa-lab.github.io/screenlamp
  20. 40

  21. 42 KNeighborsClassifier 1 2 3 4 5 6 7 8

    9 10 11 Number of features in the selected subset 0.5 0.6 0.7 0.8 0.9 1.0 Accuracy SequentialFeatureSelector +
  22. 43 1 2 3 4 5 6 7 8 9

    10 11 Number of features in the selected subset 0.5 0.6 0.7 0.8 0.9 1.0 Accuracy ‘Sulfur-Oxygens’ ‘Sulfur-Oxygens’ ‘Sulfur’ ‘Sulfur-Oxygens’ ‘Sulfur’ ‘19-Methyl’
  23. 44 1 2 3 4 5 6 7 8 9

    10 11 Number of features in the selected subset 0.5 0.6 0.7 0.8 0.9 1.0 Accuracy ‘Sulfur-Oxygens’ ‘Sulfur-Oxygens’ ‘Sulfur’ ‘Sulfur-Oxygens’ ‘Sulfur’ ‘19-Methyl’ sulfate oxygens 3-keto sul gro 12-hydroxy 7-hydroxy 18-methyl 19-methyl sulfate ester sulfur carbon tail steroid substructure
  24. 45 Sulfur <= 0.5 gini = 0.478 samples = 38

    value = [23, 15] class = non-active 12-Hydroxy <= 0.5 gini = 0.219 samples = 24 value = [21, 3] class = non-active True Sulfate-Ester <= 0.5 gini = 0.245 samples = 14 value = [2, 12] class = active False Sulfate-Ester <= 0.5 gini = 0.1 samples = 19 value = [18, 1] class = non-active 3-Keto <= 0.5 gini = 0.48 samples = 5 value = [3, 2] class = non-active gini = 0.0 samples = 16 value = [16, 0] class = non-active gini = 0.444 samples = 3 value = [2, 1] class = non-active gini = 0.5 samples = 4 value = [2, 2] class = non-active gini = 0.0 samples = 1 value = [1, 0] class = non-active 12-Hydroxy <= 0.5 gini = 0.444 samples = 6 value = [2, 4] class = active gini = 0.0 samples = 8 value = [0, 8] class = active gini = 0.48 samples = 5 value = [2, 3] class = active gini = 0.0 samples = 1 value = [0, 1] class = active Mostly active Mostly non-active DecisionTreeClassifier
  25. 46 RandomForestClassifier Sulfur Sulfate-Ester Sulfate-Oxygens 12-Hydroxy 3-Hydroxy 3-Keto 18-Methyl 19-Methyl

    C4-C5-DB 12-Keto C6-C7-DB 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Relative feature importance sulfate oxygens 3-keto su gr 12-hydroxy 7-hydroxy 18-methyl 19-methyl sulfate ester sulfur carbon tail steroid substructure
  26. 49

  27. 52 ~50% ~50% Pheromone (@ 10-13 M ) Pheromone (@

    10-13 M ) Concentration of gold in the ocean: 4 x 10-11 M (https://web.stanford.edu/group/Urchin/mineral.html)
  28. 54 0% 100% Pheromone (@ 10-12 M ) Pheromone (@

    10-12 M ) Antagonist Discovered (@ 5x10-13 M ) (@ 5x10-13 M ) +
  29. 55 Acknowledgements Kuhn Lab Leslie A. Kuhn Nan Liu Santosh

    Gunturu Jiaxing Chen Weiming Li Lab Weiming Li Anne M. Scott Mar Huertas Software & Developers Python (https://www.python.org) Matplotlib (https://matplotlib.org) Scikit-learn (http://scikit-learn.org) IPython (https://ipython.org) Jupyter Notebook (http://jupyter.org) Pandas (https://pandas.pydata.org) OpenEye (https://www.eyesopen.com) OpenBabel (http://openbabel.org) Great Lakes Fishery Commission