Primer on Artificial Intelligence for Gastroenterology Peter D.R. Higgins Director, IBD Program University of Michigan @ibddoctor Slides: https://speakerdeck.com/higgi13425
is Artificial Intelligence? • AI allows unsupervised computer algorithms to do very specific tasks that previously required a human brain. • Many related terms • Machine Learning • Deep Learning • Neural Networks • Random Forests • Convolutional Neural Networks (ConvNets) • Automated Feature Extraction • Transfer Learning
Input Data and Tasks for Artificial Intelligence in Medicine • Image Recognition • Barrett’s Esophagus with Dysplasia • Gastric Cancer • Pancreatic Cancer • Text Recognition – NLP (Natural Language Processing) • Classification – is this cancer or not? • Assessment – is this quiescent, mild, moderate, or severe? • Prognosis – how probable is the outcome of colectomy?
Does the Human Brain Do This? • Layers of neurons in the visual cortex • Take input from R & L retina • Lateral comparisons to detect edges • Edges feed into recognition of shapes • Shapes feed into recognition of complex objects
Do Neural Networks Do This? Data fed into Nodes, Nodes in Layers One node Node layers – each layer’s output Is the next layer’s input Pixels RGB channels
Networks are Good At Classification, but not at Feature Extraction • For years, you needed experts to tell you which features were important • Often difficult – “I just know dysplasia when I see it” • Had to hand-code algorithms to detect these features, then feed into classifier algorithm • Convolution allows automated feature extraction from images • Can find important features that experts might not have known about. • Then feed features into a classifier – Convolutional Neural Nets
do Convolutional Neural Networks (CNNs) Do This? - Scan image in manageable chunks (the kernel) = convolution - Roll up (convolve) adjacent pixels to detect features – edges, color, gradient orientation - Pool to reduce noise and data size - Feed features into a neural network classifier
AI/Deep Learning now? • Deep learning developed in the 1980s, but it is exploding now: • Labeled Digital Data • Enormous classified/labeled datasets required • More digital data, including digital video, is available now. • Computing power • New high-performance Graphics Processing Units (GPUs) have a parallel architecture that is very efficient for deep learning (& video games). • Clusters with many GPUs reduce model training time from weeks to hours or less.
Probabilities • Models provide probabilities, not answers • We choose the cut points to classify predicted outcomes • The cut points should reflect medical goals, not statistical ones • Depends on the downstream implications Esophageal biopsy Whipple surgery for pancreatic cancer
Cut Points Esophageal biopsy Whipple surgery for pancreatic cancer Screening for esophageal cancer Risks if positive – low If false positive, will be negative Bx Choose high sensitivity, high NPV Diagnosing pancreatic cancer Risks if positive – high If false positive, will get Whipple Choose high specificity, high PPV
Cases in Medicine • Genetic Risk models in IBD • NOD2 variants predict complicated Crohn’s disease • TPMT variants predict slow processing of thiopurines • Both models developed in Europeans / European-Americans • Apply models in Asia • Completely fail • Entire population of Asia = “edge cases” = 59% of world population • Need diversity in your dataset to make models generalizable.
Danger of Trivial Features You train a model on a dataset of histopathology images to detect dysplasia The model is 100% accurate. Amazing! What is the most important feature?
on CXR Top predictors: - Presence of central line - Presence of NG tube - Lower resolution(portable) - The word “Portable” Inpatients are more likely to have pneumonia Models use all of the information provided, even if it is trivial.
Makes a Good Model? • Lots and Lots of Data • Not over-fitted to one dataset • Rule of thumb for logistic regression – 10 cases per predictor • CNNs have thousands of nodes…. • Often random split data into train/test: 70/30, 80/20 • Model tested on a testing set (large) • Even better with an independent, multicenter testing set, or multiple testing sets • Data from a very diverse range of sources – generalizable? • Is the testing set representative of real world practice? • Of your patients in your practice?
does the AI/CNN/DL model work? • Important Features • Do the features make sense? • Do the features at least correlate with something important? • Are features trivial? • Explainers like LIME • Local Interpretable Model-agnostic Explanations
Model Explainer • Can explain for each case which features support or contradict the classification • In this case, breast cancer biopsies classified as benign or malignant • Features from H & E stained images
Ramifications of AI • GDPR (2018) • EU General Data Protection Regulation • Right to Privacy, Right to be Forgotten, and Right to Explanation • to obtain an explanation of the decision reached after assessment based solely on automated processing • France Digital Republic Act (2016) • After a decision taken on the basis of an algorithmic treatment, citizens have the right to be informed of • the degree and the mode of contribution of the algorithmic processing to the decision- making; • the data processed and its source; • the model parameters, and where appropriate, their weighting, applied to the situation of the person concerned; • the operations carried out by the treatment.
Challenges in Endoscopy • Models that work on still images != video images • Lots of video frames will be blurry, unfocused • Lots of colonoscopy frames will be obscured by stool • Need nearly real-time image recognition • Fast computing • Timely and useful feedback to endoscopist
Intelligence in GI • AI is coming fast. • Big digital data and GPU technology have converged. • Data sets need to be diverse, and model generalizability tested • Edge cases must be identified & incorporated into models. • Multiple large, independent testing/validation data sets. • Models need to be explained • To identify & avoid trivial features in your predictive model • For legal reasons in Europe
You Deep Learning for Automated Scoring of Video Endoscopy in Ulcerative Colitis Sunday 10:45 Room 30 Ryan Stidham Slides: https://speakerdeck.com/higgi13425