Crystalformer: Infinitely Connected Attention for Periodic Structure Encoding (ICLR 2024)

1 Crystalformer: Infinitely Connected Attention for Periodic Structure Encoding Tatsunori
Taniai OMRON SINIC X Corporation Ryo Igarashi OMRON SINIC X Corporation Yuta Suzuki Toyota Motor Corporation Naoya Chiba Tohoku University Kotaro Saito Randeft Inc. Osaka University Yoshitaka Ushiku OMRON SINIC X Corporation Kanta Ono Osaka University The Twelfth International Conference on Learning Representations May 7th through 11th, 2024 at Messe Wien Exhibition and Congress Center Vienna, Austria

2 Materials science and crystal structures Materials science Explore and
develop new materials with useful properties and functionalities, such as superconductors and battery materials. Crystal structure • Source code of material. • Infinitely repeating, periodic arrangement of atoms in 3D space. • Described by a minimum repeatable pattern called unit cell. Crystal structure of NaCl Unit cell

3 Material property prediction Crystal structure Material properties • Formation
energy • Total energy • Bandgap • Energy above hull • etc. Neural network • Much faster than density functional theory (DFT) calculations. • Useful for accelerating material discovery and development processes. (unit cell)

4 Interatomic message passing layers Periodic SE(3) invariant prediction Material
properties • Formation energy • Total energy • Bandgap • Energy above hull • etc. Rotation Translation Periodic boundary shift Crystal structure • Evolve the state feature of each unit cell atom via interatomic interactions. • For property prediction, networks need to be invariant to periodic SE(3) transformations of atomic positions. • While not our focus, force prediction requires networks to be SE(3) equivariant.

5 Crystals Molecules Advances in ML Advances in material representation
learning Many geometric GNNs • Duvenaud+ 2015 • Kearnes+ 2016 • Gilmer+ 2017 Success of Transformers • Graphormer [Ying+ 2021] • Equiformer [Liao+ 2023] CNNs (2011-) • ResNet [He+ 2015] GNNs (2015-) • PointNet [Qi+ 2016] • DeepSets [Zaheer+ 2017] • GCN [Kipf & Welling 2017] • GIN [Xu+ 2018] Transformers (2017-) • Transformer [Vaswani+ 2017] • BERT [Devlin+ 2018] • Image generation [Parmar+ 2018] • ViT [Dosovitskiy+ 2020] 3D arrangement of finite atoms 3D arrangement of infinite atoms Many geometric GNNs • CGCNN [Xie & Grossman, 2018] • SchNet [Schütt+ 2018] • MEGNet [Chen+ 2019] Emergence of Transformers • Matformer [Yan+ 2022] Graphormer (2021) showed that fully connected self-attention networks are effective for molecules. However, whether such dense attention is applicable for crystal structures is still an open question because of their infinite structure sizes.

6 Atomic state evolution by self-attention Molecule Fully connected self-attention
for finite elements Relative position representations and • Encode relative position between atoms and . • | | scalar bias for softmax logits. • | | : vector bias for value features. • Distance-based representations ensure SE(3) invariance. Atom-wise state features • , , and : linear projections of input atom-wise state feature . • : output atom-wise state feature.

7 Atomic state evolution by self-attention Molecule Crystal structure Fully
connected self-attention for finite elements Infinitely connected self-attention for periodic elements Unit cell (𝒏) and (𝒏) encode relative position (𝒏) to reflect periodic unit cell shifts 𝒏.

8 Unit cell Interpretation as neural potential summation (𝑟) where
Distance decay attention

9 Interpretation as neural potential summation where Distance decay attention
Interpreted as interatomic energy calculations in abstract feature space • 𝑟 ─ Abstract interatomic potential between atoms and ( ) • ── Abstract influences on atom from atom ( ) Analogy to potential summation in physics simulations • For example, the electric potential energy between one and many points, with electric charges , is calculated as σ 1 4 0 − .

10 Periodic spatial encoding Periodic edge encoding Performed as finite-element
self-attention Infinitely connected attention can be performed just like standard self-attention for finite elements with new position encoding and .

11 Evaluations on the Materials Project dataset Form E. Bandgap
Bulk mod. Shear mod. Train/Val/Test 60000/5000/4239 60000/5000/4239 4664/393/393 4664/392/393 MAE unit eV/atom eV log(GPa) log(GPa) CGCNN [Xie & Grossman, 2018] SchNet [Schütt+, 2018] MEGNet [Chen+, 2019] GATGNN [Louis+, 2020] M3GNet [Chen & Ong, 2022] ALIGNN [Choudhary & DeCost, 2021] Matformer [Yan+, 2022] PotNet [Lin+, 2023] 0.031 0.033 0.03 0.033 0.024 0.022 0.021 0.0188 0.292 0.345 0.307 0.28 0.247 0.218 0.211 0.204 0.047 0.066 0.06 0.045 0.05 0.051 0.043 0.04 0.077 0.099 0.099 0.075 0.087 0.078 0.073 0.065 Crystalformer 0.0198 0.201 0.0399 0.0692 Consistently outperforms most of the existing methods in various property prediction tasks, while competitive with the GNN-based SOTA, PotNet [Lin+, 2023]. Win

12 Evaluations on the JARVIS-DFT 3D 2021 dataset Consistently outperforms
most of the existing methods in various property prediction tasks, while competitive with the GNN-based SOTA, PotNet [Lin+, 2023]. Form E. Total E. Bandgap (OPT) Bandgap (MBJ) E hull Train/Val/Test 44578/5572/5572 44578/5572/5572 44578/5572/5572 14537/1817/1817 44296/5537/5537 MEA unit eV/atom eV/atom eV eV eV CGCNN [Xie & Grossman, 2018] SchNet [Schütt+, 2018] MEGNet [Chen+, 2019] GATGNN [Louis+, 2020] M3GNet [Chen & Ong, 2022] ALIGNN [Choudhary & DeCost, 2021] Matformer [Yan+, 2022] PotNet [Lin+, 2023] 0.063 0.045 0.047 0.047 0.039 0.0331 0.0325 0.0294 0.078 0.047 0.058 0.056 0.041 0.037 0.035 0.032 0.2 0.19 0.145 0.17 0.145 0.142 0.137 0.127 0.41 0.43 0.34 0.51 0.362 0.31 0.3 0.27 0.17 0.14 0.084 0.12 0.095 0.076 0.064 0.055 Crystalformer 0.0319 0.0342 0.131 0.275 0.0482 Win

13 Model efficiency comparison Our model has high model efficiency
compared to the GNN and Transformer- based SOTA methods, PotNet and Matfomer. Furthermore, the overall network architecture is simple and closely follows the original Transformer-encoder architecture. Arch. type Train/Epoch Total train Test/Mater. # Params # Params/Block PotNet [Lin+, 2023] Matformer [Yan+, 2022] Crystalformer GNN Transformer Transformer 43 s 60 s 32 s 5.9 h 8.3 h 7.2 h 313 ms 20.4 ms 6.6 ms 1.8 M 2.9 M 853 K 527 K 544 K 206 K Multi-head attention (Figure 2) + Concat Linear + Feed forward Self-attention block Self-attention block Self-attention block Self-attention block Pooling Feed forward Train and test times are evaluated on JARVIS-DFT 3D (formulation energy) dataset.

14 Summary and future work Crystalformer: Transformer encoder framework for
crystal structures. • Simple with minimal modifications to the original Transformer [Vaswani+ 2017]. • Good performance with high model efficiency. • Outperforms most of the existing methods in various property prediction tasks. New interpretation of self-attention as neural potential summation. • Key concept to enable dense self-attention between infinite atoms. • Can more explicitly bridge ML and physics. Possible extensions. • Incorporate long-range interactions via Fourier space attention. (Demo in paper) • Incorporate known interatomic potential forms into models. • Extend to SE(3) equivariant networks for force prediction. Visit our project site at https://omron-sinicx.github.io/crystalformer/

Crystalformer: Infinitely Connected Attention f...

Crystalformer: Infinitely Connected Attention for Periodic Structure Encoding (ICLR 2024)

Tatsunori Taniai

More Decks by Tatsunori Taniai

Other Decks in Research

Featured

Transcript

1 Crystalformer: Infinitely Connected Attention for Periodic Structure Encoding Tatsunori

2 Materials science and crystal structures Materials science Explore and

3 Material property prediction Crystal structure Material properties • Formation

4 Interatomic message passing layers Periodic SE(3) invariant prediction Material

5 Crystals Molecules Advances in ML Advances in material representation

6 Atomic state evolution by self-attention Molecule Fully connected self-attention

7 Atomic state evolution by self-attention Molecule Crystal structure Fully

8 Unit cell Interpretation as neural potential summation (𝑟) where

9 Interpretation as neural potential summation where Distance decay attention

10 Periodic spatial encoding Periodic edge encoding Performed as finite-element

11 Evaluations on the Materials Project dataset Form E. Bandgap

12 Evaluations on the JARVIS-DFT 3D 2021 dataset Consistently outperforms

13 Model efficiency comparison Our model has high model efficiency

14 Summary and future work Crystalformer: Transformer encoder framework for