Upgrade to Pro — share decks privately, control downloads, hide ads and more …

IJCNN 2025

IJCNN 2025

Avatar for Olivier Lézoray

Olivier Lézoray

July 14, 2025
Tweet

More Decks by Olivier Lézoray

Other Decks in Research

Transcript

  1. 1 SELF-ATTENTION BASED MULTI-SCALE GRAPH AUTO-ENCODER NETWORK OF 3D MESHES

    Saqib Nazir, Olivier Lezoray, Sebastien Bougleux
  2. 3 Introduction ❑ 3D Imagining: 3D Models and Devices •

    3D Models have appeared everywhere in our lives as a result of recent advances in 3D scanning devices such as • Laser, LiDAR, RGB-D, and rendering technologies • 3D Meshes are fundamental data representations for capturing complex geometric shapes in computer vision and graphics applications LiDAR MS Kinect
  3. 4 Introduction ❑ 3D Meshes : « Applications » •

    AR/VR • Robotics • Medical imaging • Cultural heritage preservation and many more
  4. 5 Introduction ❑ 3D Meshes : «Reconstruction using Ai »

    • 3D Mesh reconstruction is a process of reconstructing the input mesh as close as possible • 3D Mesh reconstruction is a pre-processing step that could be used for many downstream tasks such as: • Mesh Completion • Mesh Interpolation/ Extrapolation • Mesh Denoising etc.
  5. 6 Introduction ❑ 3D Meshes : « Reconstruction using Ai

    » • Euclidean domains like images, CNNs have proven to be highly effective for tasks like image classification and natural language processing. • 3D Meshes reconstruction presents unique challenges due to the non-Euclidean nature of the data. • Voxel-based representations and Point clouds. • Struggle to capture the fine geometric details of the meshes due to their high memory and computational costs. • 3D Meshes are typically represented as Graphs, where the number and orientation of neighbours for each vertex vary. • Graph Convolutional Networks (GCNs) have emerged as a promising solution, they can operate directly on graph- structured data. (Z, Chen, et al. 2023), (W, Wang, et al. 2018)
  6. 7 Proposed Model ❑ 3DGeoMeshNet: « Model Architecture » (T,

    Li, et al. 2024) Model Architecture Attention Module FeaStConv()
  7. 8 Proposed Model ❑ 3DGeoMeshNet: « Attention-based adaptive fusion »

    • PyTorch Geometric • Encoders-Decoders are GCNs based on Feature- Steered Graph Convolution • Att concatenates features for each vertex and defines attention weights through two linear layers with ReLU activation for the first layer and Softmax for the second. Combined with the decoder D • The attention mechanism is introduced by Oktay et al. for U-Net architectures, and is used to focus the attention of the network on specific target structures and regions (Y, Li, & O. Oktay et al.) Attention Module
  8. 9 Experiments ❑ 3DGeoMeshNet: « Loss Function » • 3DGeoMeshNet

    is trained using a combination of two losses. • During the training, the reconstruction loss is defined as the Mean Squared Loss (MSE). • To enforce a smooth latent space and maintain a spherical distribution for the latent representations, we use a custom spherical regularization loss. • This ensures that the latent features z lie close to a spherical manifold, promoting better generalization. (Elmoataz, Quéau, Clouard, et al.) Total Loss = LMSE + λreg · LReg 𝐿MSE = 1 𝑁 𝑥out − 𝑦 F 2 𝑥 𝐹 = ෍ 𝑖 ෍ 𝑗 𝑥𝑖𝑗 2 𝐿reg = ( 𝑥out − 𝑦 2 −1) 2 λ reg = 0.0001
  9. 10 Experiments ❑ 3DGeoMeshNet: « Experimental Setup, Datasets and Training

    Details» • 3DGeoMeshNet is trained on COMA dataset • COMA is a human facial dataset that consists of 12 classes of extreme expressions from 12 different subjects. • The dataset contains 20466 3D meshes that were registered to a common reference template with N = 5023 vertices. • Learning rate at 0.0005 and halved it every 50 epoch. • The batch size is 32 and the total epoch number is 300. • Local convolutional layers are configured with feature sizes of 8, 16, and 32 • Global convolutional layers are designed with feature sizes of 32, 64, and 128. • The training process utilized the Adam optimizer within the PyTorch framework, with the entire training process taking approximately 24 hours on an NVIDIA Geforce RT X4090 GPU. (Elmoataz, Quéau, Clouard, et al.) COMA facial Dataset
  10. 11 Experiments Method Z Mean Median PCA [10] – 1.639

    ± 1.638 1.101 FLAME [12] – 1.451 ± 1.64 0.87 Jiang et al. [22] – 1.413 ± 1.639 1.017 COMA [5] 8 0.845 ± 0.994 0.496 Zhang et al. [23] – 0.665 ± 0.748 0.434 Sun et al. [30] – 0.663 ± 0.215 0.643 Gu et al. [32] 256 0.651 ± 0.208 0.625 Yuan et al. [21] 128 0.583 ± 0.436 – SpiralNet++ [28] – 0.54 ± 0.66 0.32 FaceCom (modified) [36] 256 0.516 ± 0.708 0.255 LSA-Conv [9] 32 0.153 ± 0.217 0.077 3DGeoMeshNet (Ours) 256 0.171 ± 0.187 0.105 Linear DL ❑ 3DGeoMeshNet: « Comparison with SOTA » (V. Blanz, T, Li, Jiang, A. Ranjan, Zhang, Sun, Gu, Yuan, S. Gong, T, Li, Z. Gao, et al.)
  11. 12 Experiments Method Z Mean Median L2 3DGeoMeshNet 32 0.831

    ± 0.964 0.489 0.735 3DGeoMeshNet 64 0.632 ± 0.795 0.342 0.695 3DGeoMeshNet 128 0.522 ± 0.721 0.259 0.514 3DGeoMeshNet 256 0.516 ± 0.708 0.255 0.506 3DGeoMeshNet (Attn) 256 0.223 ± 0.217 0.153 0.179 3DGeoMeshNet (Attn+Res) 256 0.177 ± 0.191 0.110 0.150 3DGeoMeshNet (Attn+Res+Mean) 256 0.171 ± 0.187 0.105 0.146 3DGeoMeshNet (Local path only) 256 0.688 ± 0.527 0.563 0.797 3DGeoMeshNet (Global path only) 256 1.520 ± 0.705 1.425 0.973 3mm 0mm ❑ 3DGeoMeshNet: « Ablations »
  12. 13 Experiments ❑ Interpolation : « Identity Interpolation (Biometrics), Facial

    Animation, Facial Morphing, Aging Simulation, Emotion Modelling » (G. Bouritsas, et al.) z = α* z1 + (1 − α) * z2 x1 x2 α∈[0,1]
  13. 14 Experiments ❑ Extrapolation : « Identity Interpolation (Biometrics), Facial

    Animation, Facial Morphing, Aging Simulation, Emotion Modelling» x1 x2 z = z1 + α* (z2 -Z1 ) α∈R outside the interval [0,1] (G. Bouritsas, et al. 2019 CVF)
  14. 15 Experiments ❑ Denoising : « 3D Mesh Denoising »

    (Tschumperlé, Porquet, Mahboubi et al.) ► Second application of our model, we introduced synthetic Gaussian noise to the input mesh data by sampling from a Gaussian distribution with a mean of 0.0 and a standard deviation of 0.001 to assess its ability to remove noise effectively 3mm 0mm
  15. 16 Conclusion & Future Works • We present 3DGeoMeshNet, a

    multi-scale GCN for 3D mesh representation and reconstruction. • Utilizes an autoencoder-decoder architecture comprising complementary local and global networks. • Global network is optimized for capturing macro-level mesh characteristics, including overall shape topology. • Local network focuses on micro-level details, such as facial expressions. • Experimental results conducted on the widely utilized COMA human facial dataset. • Future works: The current implementation is constrained to non-textured, monochromatic meshes; a logical extension of this research will involve the incorporation of textured mesh data. • Furthermore, while the present approach operates on meshes with fixed vertex counts and topologies, future investigations will focus on generalizing the framework to accommodate meshes with heterogeneous topologies and variable vertex counts, thereby enhancing the robustness and versatility of the proposed methodology. • Applications: provide an efficient and scalable way to represent surfaces, making them highly suitable for applications such as 3D shape reconstruction, shape modelling, face recognition, and shape segmentation.