Learning Takayuki Shinohara, Haoyi Xiu and Masashi Matsuoka Tokyo Institute of Technology 2019.12.09 Wyndham San Diego Bay Side San Diego, California, USA Second International Workshop on Artificial Intelligence for 3D Big Spatial Data Processing (AI3D 2019)
signal “waveform” discretely. • Providing not only 3D point clouds, but also additional information regarding the target properties. The shape and power of the backscatter waveform are related to the geometry and reflection characteristics of the surface. 5 Cited from “Urban land cover classification using airborne LiDAR data: A review”
• ALS presents significant advantages for large-area observation. • Manual processing to extract spatial information from point clouds and their waveform is costly and time consuming. => Automatic data analysis methods are necessary. nAutomatic analysis method for full waveform data • Divided into two method ‣ 3D point cloud and manmade waveform features ‣ Raw waveform In this study, We investigate raw waveform analysis. 6
The first data driven feature extraction method. nDeep learning(Zorzi et.al., 2019) • The first method for combination of raw waveform and 2D grid. The main limitation: • Each waveform data are learned individually. ‣ Difficult to dealing with spatially irregular data. ‣ Most deep learning method are developed for regular data such as Image, Audio and Language. 7 Research Question: Do deep neural networks learn spatially irregular full waveform LiDAR data?
Net(Qi et.al, 2017) ‣ One of the deep learning method for spatially irregular data nInvestigating the power of deep learning method • Auto Encoder ‣ One of the representation learning. ‣ Data driven feature extraction method. Objective: Using a deep learning method, a new representation learning method for spatially irregular raw full-waveform data. 8
from raw data automatically. nAuto Encoder(AE) • The AE is an architecture that learns to reconstruct an input. • Reconstructing an input via a latent vector. • A latent vector have the low dimensional essential information. 10 If training is successful, bottleneck layer provides a low-dimensional latent vector for each input data. Input Reconstructed Input Latent Vector Encode Decode
• Rule-based • Man-made features and classic machine learning ÞFull-waveform laser scanners are highly advantageous for point cloud classification. Relying on manmade features that are sent to statistical classifiers or simple machine-learning algorithms. nRaw waveform data • Unsupervised classification ‣ Self organization map • Supervised classification ‣ Deep learning-based approach 11
• 1st stage: Waveform analysis by 1D CNN ‣ Individual feature extraction for waveform. ⁃ Highly miss classification. • 2nd stage: Spatial analysis by 2D CNN ‣ Raw classification results are converted to grid data with height information. ‣ Prediction the class of each grid. Individual waveform learning is diseffective Grid based spatial learning is effective Þ Spatial learning method for raw waveform data are needed. 12 Waveform Spatial
first method for raw point cloud(set of x, y, z) data. • Point Net can deal with spatially irregular data. 13 We extended to spatially irregular raw full waveform data analysis Point Net(Qi et.al., 2017)
Encoder: full-waveform data into latent vector z Decoder: transforms the latent vector z to input data Encoder Latent vector z Decoder Each x,y have waveform Input waveform Reconstructed waveform Input data Reconstructed data x x y y Reconstruct special distribution and its waveform
Encoder: Point Net based • Decoder: simple Multi Layer Perceptron(MLP) 16 62 2,048 … 1D Conv features features features features features features … Input full waveform LiDAR Data Output full-waveform LiDAR Data Max Pool MLP 62 2,048 Latent vector PointNet based Encoder Decoder T-nets
local features for each dataset. • Some 1D convolution layer. 17 62 2,048 … 1D Conv features features features features features features … Input full waveform LiDAR Data Output full-waveform LiDAR Data Max Pool MLP 62 2,048 Latent vector PointNet based Encoder Decoder T-nets
causes the points to be independent from rigid transformations. • We can get robust features for object angle. 18 62 2,048 … 1D Conv features features features features features features … Input full waveform LiDAR Data Output full-waveform LiDAR Data Max Pool MLP 62 2,048 Latent vector PointNet based Encoder Decoder T-nets
global features (Latent Vector) over all the data by a max pooling layer as a symmetric function. 19 62 2,048 … 1D Conv features features features features features features … Input full waveform LiDAR Data Output full-waveform LiDAR Data Max Pool MLP 62 2,048 Latent vector PointNet based Encoder Decoder T-nets s
reconstructed data the same as those of input data. 20 62 2,048 … 1D Conv features features features features features features … Input full waveform LiDAR Data Output full-waveform LiDAR Data Max Pool MLP 62 2,048 Latent vector PointNet based Encoder Decoder T-nets s
density ‣ about 300 points/m2 • Used area ‣ One of the flight path nTraining data • Sample size: ‣ Train, Val, Test: 300,000, 100,000 100,000 • Input dimension: 2,048x62 (x, y, waveforms) ‣ We selected data including 60 returns, to simplify the problem. 23 Used data observed from flight path
novel representation learning method for spatially distributed full-waveform data observed from an ALS using an AE-based architecture called FWNetAE. • The results demonstrate a generalization error for invisible test data. • Moreover, the FWNetAE encoded a meaningful latent vector and the decoders reconstructed the spatial geometry and its waveform value from the encoded latent vector. • However, the PointNet-based encoders could not deal with various input dimension and extract features at various resolutions. nFuture Study • Modern Hieratical learning: PointNet++, Dynamic Graph CNN • Application for Supervised Learning 28