Data 4. Crystal Representations 5. Classical Learning 6. Deep Learning 7. Building a Model from Scratch 8. Accelerated Discovery 9. Generative Artificial Intelligence 10. Future Directions
Tetragonal, Orthorhombic, Hexagonal, Trigonal, Monoclinic, Triclinic} ML Model Crystal structure Crystal system Black box of trained parameters “Black box” is a common criticism of ML – with the right tools you can open it up!
problems face the “curse of dimensionality” An exponential increase in the data requirements needed to cover the parameter space effectively, O(eD) M. M. Bronstein et al, arXiv:2104.13478 (2021) Sampling 10 points per dimension: 10D samples in D dimensions
feature values and output a single discrete value (the class) • Regression model – input a tensor of feature values and output a continuous (predicted) value • Feature – an input variable • Labelled example – a feature with its corresponding label (the “answer” or “result”) • Ground truth – reliable reference value(s) • Hyperparameter – model variables that can be tuned to optimise performance, e.g. learning rate See https://developers.google.com/machine-learning/glossary
& high bias High validation error & high variance • Underfitting – model too simple to describe patterns • Overfitting – model too complex and fits noise
Data cleaning and feature engineering The exact workflow depends on the type of problem and available data Model training and validation Final model xnew ypredict Test (20%) xtest , ytest Train (80%) xtrain , ytrain Human time intensive Computer time intensive Production
– Mean Absolute Error = • RMSE – Root Mean Square Error = • Standard Deviation – a measure of the amount of dispersion in a set of values. Small = close to the mean. Expressed in the same units as the data, e.g. lattice parameters a = 4 Å, 5 Å, 6 Å mean = (4+5+6)/3 = 5 Å deviation = -1, 0, 1; deviation squared = 1, 0, 1 sample variance σ2 = (1+0+1)/2 = 1 standard deviation σ = 1 Å Model Assessment σ 𝑖=1 𝑛 𝑒𝑖 𝑛 σ 𝑖=1 𝑛 (𝑒𝑖 )2 𝑛 𝑒𝑖 = 𝑦𝑖 − 𝑦 𝑖 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑
until a cost function (e.g. RMSE) is minimised Gradient descent is a popular choice: 𝑤𝑖 → 𝑤𝑖 - α 𝑑 Error 𝑑𝑤𝑖 Learning rate Warning: local optimisation algorithms often miss global minima Parameter set w Error Step 0 3 1 2
until a cost function (e.g. RMSE) is minimised Gradient descent is a popular choice: 𝑤𝑖 → 𝑤𝑖 - α 𝑑 Error 𝑑𝑤𝑖 Learning rate Animation from https://github.com/jermwatt/machine_learning_refined w RMSE
parameters, e.g. step size and no. of iterations Animation from https://github.com/jermwatt/machine_learning_refined w Step number RMSE Learning rate (step size)
two variables (e.g. “ground truth” vs predicted values) r ∊ [-1,1] Positive: variables change in the same direction Zero: no relationship between the variables Negative: variables change in opposite directions *Outlined by Auguste Bravais (1844); https://bit.ly/3Kv75GJ Reminder: correlation does not imply causation Pearson correlation* 𝑟𝑥𝑦 = σ 𝑖=1 𝑛 (𝑥𝑖 − ҧ 𝑥)(𝑦𝑖 − ത 𝑦) σ 𝑖=1 𝑛 (𝑥𝑖 − ҧ 𝑥)2 σ 𝑖=1 𝑛 (𝑦𝑖 − ത y)2
for a model. Describes how well that known data is approximated r2 ∊ [0,1] Zero: baseline model with no variability that predicts 0.5: 50% of the variability in y is accounted for One: model matches observed values of y exactly S. Wright “Correlation and Causation”, J. Agri. Res. 20, 557 (1921) Note: a unitless metric. Alternative definitions are sometimes used 𝑟2 = 1 − 𝑆𝑆𝑟𝑒𝑠 𝑆𝑆𝑡𝑜𝑡 ഥ 𝒚 𝑟2 = 1 − σ𝑖=1 𝑛 𝑦𝑖−𝑦 𝑖 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 2 σ 𝑖=1 𝑛 (𝑦𝑖− ത 𝑦)2 𝑟2 = 1 − σ𝑖=1 𝑛 𝑒𝑖 2 σ 𝑖=1 𝑛 (𝑦𝑖− ത 𝑦)2 Three equivalent definitions
classification model performance K. Pearson “Mathematical Contributions to the Theory of Evolution” (1904) True positive (𝐓𝐏) False negative (𝐅𝐍) False positive (𝐅𝐏) True negative (𝐓𝐍) Actual class Predicted class + + - - 70 0 0 30 Perfect model to classify metals and insulators (N = 100) 66 4 8 22 My best model Accuracy = Correct/Total (70+30)/100 = 100 % (66+22)/100 = 88 %
classification model performance K. Pearson “Mathematical Contributions to the Theory of Evolution” (1904) True positive (𝐓𝐏) False negative (𝐅𝐍) False positive (𝐅𝐏) True negative (𝐓𝐍) Actual class Predicted class + + - - 70 0 0 30 Perfect model to classify metals and insulators (N = 100) 66 4 8 22 My best model Sensitivity = TP/(TP+FN) 70/(70+0) = 100 % 66/(66+4) = 94 %
based on example input-output pairs (labelled data) Regression predicts a continuous value, e.g. a reaction rate 𝒚 = 𝑓(𝐱) + ε Target variable Learned function Error We’ll start to cover the model (function) details in Lecture 5
Morita et al, J. Chem. Phys. 153, 024503 (2020) Support vector regression (r2=0.92) Note: outliers are often interesting cases (poor or exceptional data)
Morita et al, J. Chem. Phys. 153, 024503 (2020); https://github.com/slundberg/shap SHAP (SHapley Additive exPlanations) analysis is a method for interpreting ML models Relative importance of input features in making predictions A positive SHAP indicates a feature contributes to an increase in the prediction
Morita et al, J. Chem. Phys. 153, 024503 (2020); https://github.com/slundberg/shap CdCN2 Transparent breakdown of a predicted value Note: a physical connection is not implied (only a correlation)
based on example input-output pairs (labelled data) Classification predicts a category, e.g. decision trees for reaction outcomes 𝒚 = 𝑓(𝐱) Class label Classifier Assignment can be absolute or probabilistic (e.g. 90% apple, 10% pear)
a dataset (unlabeled data) Clustering groups data by similarity, e.g. high-throughput crystallography 𝐱 → 𝑓(𝐱) Input data Transformation to new representation
trial and error to achieve an objective Maximise reward, e.g. reaction conditions to optimise yield Agent ⇌ Environment Automated experiments Actions Samples, conditions, etc.