Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning for Materials (Lecture 4)

Aron Walsh
January 30, 2024

Machine Learning for Materials (Lecture 4)

Slides linked to https://github.com/aronwalsh/MLforMaterials. Updated for 2025.

Aron Walsh

January 30, 2024
Tweet

More Decks by Aron Walsh

Other Decks in Science

Transcript

  1. Aron Walsh Department of Materials Centre for Processable Electronics Machine

    Learning for Materials 4. Crystal Representations Module MATE70026
  2. Module Contents 1. Introduction 2. Machine Learning Basics 3. Materials

    Data 4. Crystal Representations 5. Classical Learning 6. Artificial Neural Networks 7. Building a Model from Scratch 8. Accelerated Discovery 9. Generative Artificial Intelligence 10. Recent Advances
  3. Representation of Materials Model performance depends on the choice of

    compositional and structural features Minimal representation Input: Atomic number, Z Coordinates, R Output: Properties Ab initio quantum mechanics (QM) ൿ ෡ 𝐇|Ψ = ۧ E|Ψ electronic wavefunction Effective representation Input: Feature vector, 𝐗 Output: Properties Supervised machine learning (ML) 𝑦 = 𝑓 𝐗, 𝚯 learned weights K. T. Butler et al, Nature 559, 547 (2018)
  4. How to Best Represent a Molecule? Networks of atoms (nodes)

    connected by bonds (edges) J. J. Sylvester, Am. J. Math. 1, 64 (1878)
  5. How to Best Represent a Material? Many possible materials features

    from atomistic to macroscopic length scales Wavefunctions or electron density (Å) Electronic Local atomic connectivity (nm) Atomic scale Grain size and orientation (µm) Microstructure Shape (cm) Macroscale Image after Taylor Sparks (University of Utah)
  6. Hot Encoding We can use an n-dimensional vector to categorise

    the atomic number of the elements in a compound [100000000...] H He Li Be B C N O F…. [000001010...] H He Li Be B C N O F…. Element (One-hot) Compound (Multi-hot) '1' indicates the presence of that specific element and '0' for others
  7. Hand-Built (Local) Representations We can define elemental feature vectors based

    on standard properties of the elements 22 dimensional Magpie representation from L. Ward et al, npj Comp. Mater. 2, 16028 (2016) https://github.com/WMD-group/ElementEmbeddings
  8. Hand-Built (Local) Representations We can also define compound feature vectors

    based on standard properties of the elements X(Fe2 O3 ) = [2X(Fe) + 3X(O)]/5 https://github.com/WMD-group/ElementEmbeddings X1 X2 X3 … Xn Fe 0.52 0.11 0.01 0.80 O 0.32 0.23 0.14 0.64 Fe2 O3 0.40 0.18 0.09 0.70 Different types of pooling is possible (e.g. max, min, mean)
  9. Learned (Distributed) Representations SkipSpecies 200 D Structure graph pooling Mat2Vec

    200 D Literature word embedding https://github.com/WMD-group/ElementEmbeddings We can learn continuous feature vectors with elemental information as part of model training
  10. Element Embeddings Toolkit to access and modify elemental and compositional

    representations for machine learning https://github.com/WMD-group/ElementEmbeddings Latest embeddings CrystaLLM SkipSpecies CGNF Dr Anthony Onwuli
  11. Learned Chemical Similarity Quantify with distance (e.g. Chebyshev), similarity (e.g.

    Cosine), or correlation (e.g. Pearson) metrics cos 𝜃 = 𝑨 ∙ 𝑩 𝑨 𝑩 Cosine similarity B A Anthony Onwuli et al, Digital Discovery 2, 1558 (2023) Name Dimension Type Magpie 22 Element properties Mat2Vec 200 Chemical abstracts Skipatom 200 Crystal structure graphs MegNet 16 Graph neural network CrystaLLM 512 Crystal structure text Bi H
  12. Learned Chemical Similarity Dimensionality reduction confirms a natural clustering of

    elements into “groups” Principal Component Analysis (PCA) Anthony Onwuli et al, Digital Discovery 2, 1558 (2023)
  13. Learn from Crystallography High symmetry crystal: MgO Cubic 8 atom

    unit cell a = b = c Low symmetry crystal: BiVO4 Monoclinic 24 atom unit cell a ≠ b ≠ c
  14. Learn from Crystallography 7 crystal systems, 14 Bravais lattices, 230

    space groups, 103 prototype structures Conventional description Unit cell (ℒ) a, b, c, ⍺, β, ɣ Fractional coordinates (𝒳) (x1 , y1 , z1 )… Atom types (𝒜) Sn, Ti, O… Problem for ML: conventional description lacks invariance* *with respect to atomic permutation, unit cell rotations, and translations
  15. Unit Cell Transformations The same structure is described in each

    case 4 5 6 0 0 0 0.5 0.5 0.5 𝑎 𝑏 𝑐 𝑥1 𝑦1 𝑧1 𝑥2 𝑦2 𝑧2 Two-atom orthorhombic unit cell Atomic permutation 4 5 6 0.5 0.5 0.5 0 0 0 Crystal rotation Unit cell translation 4 5 6 0.0 0.5 0.5 0.5 0 0 5 4 6 0.5 0.5 0.5 0 0 0 ML models based on variant representations struggle to generalise
  16. Structural Representations Many structural descriptors have been developed Several are

    implemented in https://singroup.github.io/dscribe • Atom-Centered Symmetry Functions (Behler, 2011) - site expansion of radial and angular terms • Coulomb Matrix (Rupp et al, 2012) - mimics electrostatic interactions (qi qj /rij ) • Many Body Tensor Representation (Huo et al, 2017) - distribution of local structural motifs • Atomic Cluster Expansion (Drautz, 2019) - high body-order expansion of atomic environments
  17. Real Space Grid Voxels (three-dimensional pixels) used in computer graphics

    can describe a unit cell Image courtesy of Taylor Sparks (University of Utah) Used in early materials ML, but not recommended for structure
  18. Pairwise Interatomic Distances Coulomb matrix is a global descriptor that

    mimics the electrostatic interaction between nuclei Implemented in https://singroup.github.io/dscribe Sine matrix is a modification that accounts for periodicity
  19. Invariant Structural Representations Atomic Cluster Expansion (ACE) provides a systematic

    representation of atomic environments through radial (R) and angular (Y) terms 𝜙 𝑟 = 𝑅𝑙 𝑌𝑙 𝑚 Site basis function 𝑨𝒊 = ෍ 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟𝑠 𝜙 𝑟 Permutation invariance 𝑩𝒊 = න 𝑨𝒊 𝑑𝑄 Rotation (Q) invariance R. Drautz, Phys. Rev. B. 99, 014104 (2019); arXiv:2311.16326 (2023) Product basis B forms a body-order expansion Property = 𝑓(𝑩, 𝚯) ACE is used in linear and deep learning models for materials weights
  20. ML Powered Molecular Dynamics J. D. Morrow, J. L. A.

    Gardner and V. Deringer, J. Chem. Phys. 158, 121501 (2023) Octahedral tilt correlation Classical models are being complemented by machine learning force fields (MLFF) Three start-of-the-art implementations based on equivariant neural network regression are MACE, Allegro, and SevenNet
  21. ML Powered Molecular Dynamics Xia Liang et al, J. Phys.

    Chem. C 127, 12941 (2023) Octahedral tilt correlation Enable large-scale simulations of complex materials such as organic-inorganic solids 69,120 atom simulation of CsPbI3 perovskite based on the atomic cluster expansion (ACE) Animation by Will Baldwin (Small 20, 2303565, 2024)
  22. Graphs Graphs are a representation common to many domains and

    problems Image courtesy of Michael Bronstein (University of Oxford)
  23. Graph Components Nodes (Vertices), Edges, Global Attributes Crystal systems N

    – atoms E – bonds G – unit cell or materials properties N Edge Edge Edge Global N N Vectors can be associated with each component to encode & exchange information
  24. Graph Components Nodes (Vertices), Edges, Global Attributes Image from https://distill.pub/2021/gnn-intro

    Graphs can be fully connected (every node connected to every other node), but sparse connections are often used
  25. Graph Components Nodes (or Vertices), Edges, Global Attributes Image from

    https://distill.pub/2021/gnn-intro For chemical problems, nearest-neighbour connectivity is common, as used in “ball and stick” representations Three edges Graph (excluding H nodes) Molecule (including H)
  26. Standard crystallographic representation of materials Fractional positions xyz of atoms

    within a unit cell formed of lattice vectors abc Effective for humans Crystal graph representation Nodes (atoms) connected by edges (bonds). Multiple edges can describe periodicity Effective for ML models Crystal Graphs T. Xie and J. C. Grossman, Phys. Rev. Lett. 120, 145301 (2018)
  27. Materials Graphs Nodes can be used to represent larger structural

    units of a crystal or even entire grains M. Dai et al, npj Comp. Mater. 7, 103 (2021)
  28. Multi-Scale Representations Ongoing efforts to combine features that bridge from

    the micro to macroscale; from atoms to devices S. B. Torrisi et al, APL Machine Learning 1, 020901 (2023)
  29. Class Outcomes 1. Describe the ways that chemical composition can

    expanded into vectors 2. Explain how the structure of a material can be represented for machine learning 3. Consider the limitations of a graph-based description of a three-dimensional structure Activity: Navigating crystal space