Triaging Scholarly Information Overload: Using LLMs and Computation Chemistry to find promising candidates for X
by Ziyang Zhang, Yixi Ding, Thushari Pahalage, Jiaying Wu, Giorgia Pastorin, Raye Yeow, Min-Yen Kan.
Presented by Min-Yen Kan at the TUMCreate DRAGON Symposium.
you? Where do we experience high volumes of tangentially relevant signals for the problems of interest? AI is inherently interdisciplinary, but we need to ask the right questions 3 Predict ROS Selectivity Extrapolating efficacy over K or M candidates Designing Divergent and Convergent SCC Selecting SCC metallacages Characterising host–guest chemistry EPR effect prediction Forecasting Translation Scalability Synthesis Cost Estimation Discerning Trends in PFAS Removal Creativity in System Design
to efficient and robust mechanisms that serve to archive, curate, and make primary data available. But very few parallel systems exist for derived data. Because most, if not all, scientific articles in Astronomy are based on derived data, making such data visible, intelligible and available to the public is of fundamental importance.” How Do Astronomers Share Data? Pepe et al. (2014) PLOS One
Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4. (Microsoft Research AI4Science, arXiv: 2311.07361) Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. (Lee et al., N Engl J Med 2023) 10
case study of DFU • Diabetic Foot Ulcer • Qualify & Quantify: AI + CC Workflow o Qualify: AI LLM Screening o Quantify: CC in silico Simulation o Validate: Assay in vitro validation o Towards the Future 15 Oct 2024 TUM-NUS DRAGON Symposium (Singapore) 13
diabetes. Prevalence & Health Impact o Globally, every 20 seconds, someone loses a leg due to DFUs. Regardless of the amputation scale, the five-year survival rate is less than 50%. o Diabetic (14.9% in SG) → hospitalization due to DFU (6%) Challenges in Current Therapies o DFU healthcare is highly complex and hard to customize due to personal characteristics (gait, wound shape and micro-environment shift). o Conventional drug discovery is time-consuming and costly. 14 Guidelines on interventions to enhance healing of foot ulcers in people with diabetes (Chen et al., IWGDF 2023 update)
to protein dysregulation. In the case of DFUs, such dysregulation hinders the body’s ability to heal wounds effectively. o Hypothesis 1: When diabetic wounds induce damaging changes in the concentration of specific proteins, medications inhibiting such changes may prove beneficial. o Hypothesis 2: When a particular efficacious therapeutic intervention induces positive changes in the concentration of proteins within the diabetic wound, stimulating such changes may be helpful. 15
might be present through an LLM search of the literature Quantify effect in silico through quantum chemistry Validate using in vitro assays Larger Number of Candidates More expense per candidate
case study of DFU o Qualify: AI LLM Screening o Quantify: CC in silico Simulation o Validate: Assay in vitro validation o Towards the Future 15 Oct 2024 TUM-NUS DRAGON Symposium (Singapore) 17
Diseased and Healthy Tissues Change in concentration levels of a protein in DFU patients when compared to those in the healthy control group. 2. Signal treatment : Comparison Before and After Treatment Change in concentration levels of a protein after the treatment of DFU, compared to pre-treatment levels. 18
setting Test Set: • 97 (protein, document) pairs • Manually 136 annotated sentences (“Evidence”). • For proteins w/ multiply detected evidence, decide the result by voting. Metrics: Precision • Make predictions that are correct with respect to gold standard • Ignore those predicted as unknown 22
candidates, our process shortlists 35 drug candidates from 756 protein–drug pairs for correlation study. Newly identified Folic Acid as the most promising candidate 23
increased after 5 days of Hyperbaric Oxygen Therapy (HBOT) and decreased progressively until the end of the treatment, when the lowest plasma levels were observed." Imbalance in the number of instances for each proteins: o Instance: a (paper, protein) pair o Tend to mark it as “unknown” e.g., when the two signals conflict • Result: Proteins with less evidence are more prone to being overlooked 24
case study of DFU o Qualify: AI LLM Screening o Quantify: CC in silico Simulation o Validate: Assay in vitro validation o Towards the Future 15 Oct 2024 TUM-NUS DRAGON Symposium (Singapore) 25
case study of DFU o Qualify: AI LLM Screening o Quantify: CC in silico Simulation o Validate: Assay in vitro validation o Towards the Future 15 Oct 2024 TUM-NUS DRAGON Symposium (Singapore) 33
case study of DFU o Qualify: AI LLM Screening o Quantify: CC in silico Simulation o Validate: Assay in vitro validation o Towards the Future • at ground level: on Folic Acid • at 30,000 feet: Literature Whispers 15 Oct 2024 TUM-NUS DRAGON Symposium (Singapore) 42
Acid breaks down into: o Pyrrolidone Carboxylic Acid (PCA) o Pteroylglutamic Acid (PGA) o Xanthopterin (XA) o Pterin How do these byproducts cause or perturb efficacy? How well can we attribute the efficacy of folic acid to its byproducts? 43
potentially help you? Where do we experience high volumes of tangentially relevant signals for the problems of interest? AI is inherently interdisciplinary, but we need to ask the right questions 45 Predict ROS Selectivity Extrapolating efficacy over K or M candidates Fuctionalising Divergent Convergent SCC Synthesis Cost Estimation Characterising host–guest chemistry EPR effect prediction Forecasting Translation Scalability Forecasting Lifetimes of Supramolecular Materials Discerning Trends in PFAS Removal Creativity in System Design