remote sensing data: synthetic aperture radar (SAR) and electro-optical imagery ▪ AOI: Rotterdam, the Netherlands (~120 km^2) Building footprint annotations overlaid on electro-optical imagery (left) and SAR imagery (right) (from CosmiQ Works blog post titled “SpaceNet 6: Dataset Release”) 3
illumination setting (day or night) ▪ Cloud Penetrating Cons of SAR ▪ Various types of scattering ▪ Complex geometric distortions e.g., layover See SAR-101 blog bost if interested in SAR! 5 SAR intensity (R:HH, G:VV, B:VH) of three areas from the SpaceNet 6 dataset (from CosmiQ Works blog post titled “SpaceNet 6: Dataset Release”)
Captured by aerial platform ▪ 4 channels (quad polarization) ▪ Spatial resolution ~0.5m ▪ Off-nadir look angle ~35° 6 SAR intensity (R:HH, G:VV, B:VH) of three areas from the SpaceNet 6 dataset (from CosmiQ Works blog post titled “SpaceNet 6: Dataset Release”)
Captured by WorldView-2 satellite ▪ 4 channels (RGB + NIR) ▪ Spatial resolution ~0.5m ▪ Off-nadir look angle ~17° Only available in training set 7 Visible spectrum imagery (R, G, B) of three areas from the SpaceNet 6 dataset (from CosmiQ Works blog post titled “SpaceNet 6: Dataset Release”)
building footprints in Rotterdam AOI ▪ Footprint of each building is represented as a polygon Only available in training set 8 Building footprint annotations overlaid on electro-optical imagery (from CosmiQ Works blog post titled “SpaceNet 6: Dataset Release”)
truth polygon > 0.5, it is counted as a TP (otherwise counted as a FP) ▪ Ground truth with no matched proposal is counted as a FN ▪ Compute F1 score (= evaluation metric) GT polygon Proposed polygon GT polygon Proposed polygon IoU = 9
Private LB (%) Baseline (EfficientNet-b7 × 5) 39.29 - + Watershed 42.94 - + Ensemble (EfficientNet-b7 ×10 + EfficientNet-b8 × 5) 44.38 - + LightGBM 44.80 39.61 ▪ U-Net with EefficientNet-B7 encoder (on 5-folds) achieves the comparable score to top-15 in public LB ▪ Applying watershed algorithm greatly improves the F-score (+3.65) compared to a simpler alternative used in the baseline: binarize the score map with a threshold and then extract isolated contours as polygons ▪ Ensembling U-Net models with EefficientNet-B7/B8 encoders gives a moderate improvement (+1.44) ▪ Post-processing with LightGBM models shows only a marginal improvement (+0.42) 16
best performance while having less parameters ▪ I ensembled U-Net with EfficientNet-B7/B8 encoders which achieved the best segmentation score ▪ All encoders were pre-trained on ImageNet: this makes the convergence much faster and improves the accuracy 17
▪ Lbce : binary cross entropy loss ▪ Ldice : dice loss (= 1 - dice) ▪ As dice coefficient evaluates spatial overlap (like IoU metric) for each class, it works well on class-imbalanced data ▪ Combining dice loss with binary cross entropy made the convergence faster and improved the accuracy Optimizer: Adam ▪ Adam worked better than other optimizers 18
appear laid out to the direction from which the image was captured ▪ Direction: either of north (upward in image) or south (downward in image) ▪ I selectively rotated SAR images before input to the networks so that the layover direction kept the same in every image 20 Example of layover (from CosmiQ Works blog post titled “SAR 101: An Introduction to Synthetic Aperture Radar”)
Input SAR strip ID and Y-coordinate to U-Net decoder Loss design ▪ Focal loss + dice loss ▪ Weight loss per image based on the number of buildings inside the image Pre-processing ▪ Cut out black part of images for faster training and stable BN stats Augmentation ▪ Random LR flip Test time augmentation ▪ LR flip ▪ Resize (1×, 0.8×, and 1.5×) See YouTube video by zbigniewwojna for details 36
because: ▪ participants must submit complete training and inference code with Dockerfile in TopCoder Marathon Matches style ▪ hosts re-train and evaluate the models on their own server to determine the final score 38
because: ▪ participants must submit complete training and inference code with Dockerfile in TopCoder Marathon Matches style ▪ hosts re-train and evaluate the models on their own server to determine the final score I gave an ID to each experiment so that config, trained weight, log, git hash, inference results are linked with this ID /mnt/efs/ exp_0100/ exp_0099/ exp_0101/ config_all.yaml best_weight.pth tensorboard.event git_hash.txt ./run_training.py --config config_specific.yaml --exp_id 100 39
because: ▪ participants must submit complete training and inference code with Dockerfile in TopCoder Marathon Matches style ▪ hosts re-train and evaluate the models on their own server to determine the final score I gave an ID to each experiment so that config, trained weight, log, git hash, inference results are linked with this ID /mnt/efs/ exp_0100/ exp_0099/ exp_0101/ config_all.yaml best_weight.pth tensorboard.event git_hash.txt inference_results/ ./run_training.py --config config_specific.yaml --exp_id 100 ./run_inference.py --exp_id 100 40
48 hours with p3.8xlarge (V100×4) ▪ inference: 3 hours with p3.8xlarge (V100×4) I took single GPU training strategy: ▪ in model development phase, I used p3.2xlarge (V100×1) which is much cheaper and efficient to do trial and error 43
48 hours with p3.8xlarge (V100×4) ▪ inference: 3 hours with p3.8xlarge (V100×4) I took single GPU training strategy: ▪ in model development phase, I used p3.2xlarge (V100×1) which is much cheaper and efficient to do trial and error ▪ in final testing on p3.8xlarge, 4 models were trained in parallel (each was trained on one V100 card) 44