EarthSynth: Generating Informative Earth Observation with Diffusion Models

EarthSynth: Generating Informative Earth Observation with Diffusion Models  Helios  佐々木謙一 
1 第11回 SatAI.challenge勉強会 

目次   2 • 自己紹介スライド  • 研究の1ページサマリ紹介   • 研究の背景（Introduction）
  • 手法について（Method）   • 実験（Experimet）  • 結論（Conclusion） 

3 著者紹介 This image was generated by ChatGPT

佐々木謙一 • 2012-2016:東工大機械宇宙学科 • 2016-2019:東工大院松永研究室 • 2019-2023:CU Boulder Aerospace Engineering
Ph.D. in Remote Sensing, Marine pollution monitoring • Internship • 2023-2025: Esri, product engineer in spatial analysis team • 2025: Helios

5 要約 This image was generated by ChatGPT

EarthSynth: Generating Informative Earth Observation with Diffusion Models  6 •
分類・検出・セグメンテーションを含む複数タスクに対応する生成モデルを構築 • CF-Comp（Counterfactual Composition）：複数画像の物体と背景を論理的に再構成 • R-Filter: CLIPスコアを用いて高品質な合成データのみを選別 • 下流モデルの事前学習やデータ拡張として有効   EarthSynth: 拡散モデルを用いたタスク横断の合成画像生成手法   Jiancheng Pan et al. (2025),”EarthSynth: Generating Informative Earth Observation with Diffusion Models’, arXiv. より引用

7 論文紹介 This image was generated by ChatGPT

Remote sensing Image（RSI）の課題 • ラベル作成コスト高 • クラスの偏り (例：車や建物は多いがヘリポートは少ない） • タスクごとに別の合成モデルを使う非効率性
生成モデルの役割 • 拡散モデルによる高品質データの合成 • データ多様性・一般化性能の向上   Introduction   8 Jiancheng Pan et al. (2025),”EarthSynth: Generating Informative Earth Observation with Diffusion Models’, arXiv. より引用

手法  9 EarthSynthの概要  • 条件付きDiffusion（テキスト + セマンティックマスク） • マルチソース・マルチカテゴリデータセット「EarthSynth-180K」を構築 •
生成出力は画像・マスク・テキストのトリプレット   Jiancheng Pan et al. (2025),”EarthSynth: Generating Informative Earth Observation with Diffusion Models’, arXiv. より引用

手法  10 EarthSynth  1. データ収集 & EarthSynth-180Kの構築 • 公開データセットを統合：OEM、LoveDA、DeepGlobeなど •
各画像に対して： ◦ セマンティックマスク（m） ◦ テキスト説明（t）を自動/半自動で生成 • 180,000件の (画像, マスク, テキスト) トリプレット 2. モデル学習 • Stable Diffusion v1.5をベースに再学習 • 条件付き入力：セマンティックマスク m, テキスト t • セマンティクス強化 ◦ CF-Comp（物体と背景の動的合成） ◦ Local/Global Lossによる空間制御 Jiancheng Pan et al. (2025),”EarthSynth: Generating Informative Earth Observation with Diffusion Models’, arXiv. より引用

手法  11 EarthSynth  3. サンプル生成（Inference） • 任意のマスク・テキストを入力すると、新規のRS画像 x を生成 •
出力：x,m,t（画像・マスク・テキスト）のトリプレット • R-Filterによる品質チェック 4. 下流タスクへの応用 • Scene Classification：画像 + カテゴリラベル（テキストから抽出） • Object Detection：マスクからBBox抽出 • Semantic Segmentation：マスクをそのまま使用 Jiancheng Pan et al. (2025),”EarthSynth: Generating Informative Earth Observation with Diffusion Models’, arXiv. より引用

手法  12 EarthSynth  Counterfactual Composition（CF-Comp） • 意味的に一貫した合成画像を動的に生成 • Copy-Paste により、異なる画像から物体と背景を組み合わせる
• 適合基準：ICS（色感）、MOR（マスク重なり）、TSS（テキスト類似度）    

手法  13 EarthSynth  Rule-based Filtering（R-Filter） • 生成後の合成データをCLIPスコアで評価 • 画像全体・物体部分・背景部分を評価 •
スコアが閾値以上のデータのみを学習に使用    

結果  14 Downstream task    分類：CLIP 検出：GroundingDINO セグメンテーション：GSNet　  Jiancheng Pan
et al. (2025),”EarthSynth: Generating Informative Earth Observation with Diffusion Models’, arXiv. より引用

結果  15 Downstream task    分類：CLIP 検出：GroundingDINO セグメンテーション：GSNet　  Jiancheng Pan
et al. (2025),”EarthSynth: Generating Informative Earth Observation with Diffusion Models’, arXiv. より引用

Ablation study  16 Key modules contribution  - 最も良い条件 - 128
samples/class - R-Filter: 1pt 向上 - CF-Comp: 1.4pt 向上 Jiancheng Pan et al. (2025),”EarthSynth: Generating Informative Earth Observation with Diffusion Models’, arXiv. より引用

まとめ  17 結論  EarthSynthは、単一のDiffusionモデルでタスク横断的な合成を実現 CF-CompとR-Filterによる意味的・構造的制御の強化リモートセンシングの事前学習・少数ショット学習の基盤へ応用可能感想  A100を4枚用いて45h学習、学習生成効率と計算コストがどうなっているかマルチタスクへの適用と言っているが後処理で調整してるだけ時系列データへの応用に期待 
Jiancheng Pan et al. (2025),”EarthSynth: Generating Informative Earth Observation with Diffusion Models’, arXiv. より引用

EarthSynth: Generating Informative Earth Observ...

EarthSynth: Generating Informative Earth Observation with Diffusion Models

SatAI.challenge

More Decks by SatAI.challenge

Other Decks in Research

Featured

Transcript