Learning Basics 4. Materials Data and Representations 5. Classical Learning 6. Artificial Neural Networks 7. Building a Model from Scratch 8. Recent Advances in AI 9. and 10. Research Challenge
brain. Artificial neurons mimics this behaviour using mathematical functions Image: BioMed Research International Biological neuron Artificial neuron Cell nucleus Node Dendrites Input Synapse Weights (Interconnects) Axon Output The human brain has ~ 1011 neurons and 1015 synapses (~1015 FLOPS)
Artificial Neuron The perceptron is a binary neural network classifier: weighted inputs produce an output of 0 or 1 F. Rosenblatt, Cornell Aeronautical Laboratory, Report 85-460-1 (1957) y = f(w·x+b) Output Activation function Weighted input Bias (constant) if ∑xi wi + b > threshold: output = 1 else output = 0 Weights are adjusted to minimise the model error
f(w·x+b) introduces non-linearity Image from https://towardsdatascience.com Activation function Derivative Common for deep learning Perceptron model Common for deep learning Popular in early models
model training Image from https://towardsdatascience.com Activation function Derivative Common for deep learning Perceptron model Common for deep learning Popular in early models
function to any desired accuracy K. Hurt, M. Stinchcombe and H. White, Neural Networks 2, 359 (1989) Practical performance will depend on the number of hidden layers, choice of activation function, and training data available
function to any desired accuracy S. J. D. Prince “Understanding Deep Learning” Approximation of a 1D function (dashed line) by a piecewise linear model
(no fixed functional form) Four models with similar training accuracy (in the white region) Example from https://writings.stephenwolfram.com Target function
learn representations and capture data patterns effectively • Dense (fully connected): neurons connected to every other neuron • Convolutional: filter applied to grid-like input, extracting features • Pooling: reduce spatial dimensions, retaining key information • Recurrent: incorporate feedback loops for sequential data flow • Dropout: randomly zero out inputs to mitigate overfitting in training • Embedding: map categorical variables into continuous vectors • Upscaling: increase spatial resolution of feature maps Self-study is needed if you want to get deeper into these (beyond our scope)
Computer Vision Model Softmax is an activation function common in the output layer of a neural network for classification tasks Modern deep learning models combine many layer types with 103-1012 parameters
combine many layer types with 103-1012 parameters Example from https://towardsdatascience.com Appearance of the Boltzmann distribution (recall your thermodynamics lectures) 𝜎(𝑧! ) = 𝑒"! ∑ #$% & 𝑒"" Partition function Softmax 3 6 7 11 4 Input vector 0.00 0.03 0.04 0.92 0.01 Class probability
a neural network using gradient descent (from the output layer) I. Goodfellow, Y Bengio, A. Courville, “Deep Learning” Application of the chain rule Output layer Training iteration
parameters by error minimisation I. Goodfellow, Y Bengio, A. Courville, “Deep Learning” Limitations • Slow training • Failure to converge • Local minima Improvements • Stochastic gradient descent (use random subset of data) • Batch normalisation (avoid vanishing gradients) • Adaptive learning rates (more robust convergence)
e.g. 1 MP image = 1,000,000 pixels Image: Eric Eaton and Dinesh Jayaraman (Penn Engineering) Image Array of pixels Feature vector Decision boundaries are difficult to define, e.g. to distinguish between animals based on pixels alone
in the field of computer vision Algorithmic edge detection (e.g. intensity gradients) J. Canny, IEEE Trans. Pat. Anal. Mach. Intell. 6, 679 (1986) Popular packages for classical filters include imagej and scikit-image
for processing data with a grid-like topology (images, time series…) Y. LeCun et al, Neural Computation 1, 541 (1989) US Postal Service Challenge Computer recognition of handwritten zip codes
for processing data with a grid-like topology (images, time series…) Y. LeCun et al, Neural Computation 1, 541 (1989) Input Direct images rather than feature vectors Output {0,1,2,3,4,5,6,7,8,9} 16x16 pixels Model 1000 neurons 9760 parameters
for processing data with a grid-like topology (images, time series…) Y. LeCun et al, Neural Computation 1, 541 (1989) Training 7291 examples Testing 2007 examples (included hand-chosen ambiguous or unclassifiable samples)
this network and became a standard in the field Y. LeCun et al, Proc. IEEE 1 (1998) Higher resolution input (MNIST dataset) Cn = convolutional layer n (extract features) Sn = sub-sampling layer n (reduce spatial dimensions)
network with “skip” connections for effective training Images from https://www.baeldung.com/cs/residual-networks Residual Block skip connection for x Approach avoids vanishing gradient problem for training of deep networks
Information is stored on each piece of the graph, i.e. vectors associated with the nodes, edges and global attributes Information exchange Graph Convolutional Neural Networks (GCNN) are designed for graph-structured data
for regression or classification tasks (V = vertex; E = edge, U = global attribute) Images from: https://distill.pub/2021/understanding-gnns End-to-end prediction Graph update
being used for materials modelling C. Chen et al, Chem. Mater. 31, 3564 (2019) MEGNet (two-body connections); https://github.com/materialsvirtuallab/matgl
that can predict energies and forces for any material C. Chen and S. P. Ong, Nature Computational Science 2, 718 (2022) Input Graph Output Energy Force Stress M3GNet; https://github.com/materialsvirtuallab/matgl Beyond pairwise interactions
between different types of deep learning architectures 3. Specify how convolutional neural networks work and can be applied to materials problems Activity: Learning microstructure