Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[IROS2022] Scalable Fiducial Tag Localization o...

Avatar for koide3 koide3
June 28, 2024

[IROS2022] Scalable Fiducial Tag Localization on a 3D Prior Map Via Graph-Theoretic Global Tag-Map Registration

Scalable Fiducial Tag Localization on a 3D Prior Map Via Graph-Theoretic Global Tag-Map Registration
Kenji Koide, Shuji Oishi, Masashi Yokozuka, and Atsuhiko Banno
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS2022)

Avatar for koide3

koide3

June 28, 2024
Tweet

More Decks by koide3

Other Decks in Research

Transcript

  1. Scalable Fiducial Tag Localization on a 3D Prior Map Via

    Graph-Theoretic Global Tag-Map Registration Kenji Koide, Shuji Oishi, Masashi Yokozuka, and Atsuhiko Banno National Institute of Advanced Industrial Science and Technology (AIST), Japan
  2. Background • Map-based visual localization has been attracting much attention

    • It is, however, sometimes necessary to rely on visual fiducial tags (aka visual markers) for initialization and fail-safe [Oishi, 2020]
  3. Motivation • Deploying many tags on a 3D prior map

    is sometimes difficult and tedious • Tag positions are often measured by hand; large effort and inaccurate results • We aim to develop an accurate and automatic method to determine tag poses in the environment
  4. Proposed Method 1. VIO-based Tag-Relative-Pose Estimation We use an agile

    camera to observe tags in the environment and estimate the relative poses between tags via landmark SLAM 2. Global Tag-Map Registration We then roughly align tags and a prior map by establishing tag-plane correspondences via graph-theoretic correspondence estimation 3. Estimation Refinement via Direct Camera-Map Alignment Tag and camera poses are refined by directly aligning agile camera images with the prior map and re-optimize all variables under all constraints
  5. Proposed Method 1. VIO-based Tag-Relative-Pose Estimation We use an agile

    camera to observe tags in the environment and estimate the relative poses between tags via landmark SLAM 2. Global Tag-Map Registration We then roughly align tags and a prior map by establishing tag-plane correspondences via graph-theoretic correspondence estimation 3. Estimation Refinement via Direct Camera-Map Alignment Tag and camera poses are refined by directly aligning agile camera images with the prior map and re-optimize all variables under all constraints
  6. Proposed Method 1. VIO-based Tag-Relative-Pose Estimation We use an agile

    camera to observe tags in the environment and estimate the relative poses between tags via landmark SLAM 2. Global Tag-Map Registration We then roughly align tags and a prior map by establishing tag-plane correspondences via graph-theoretic correspondence estimation 3. Estimation Refinement via Direct Camera-Map Alignment Tag and camera poses are refined by directly aligning agile camera images with the prior map and re-optimize all variables under all constraints
  7. Proposed Method 1. VIO-based Tag-Relative-Pose Estimation We use an agile

    camera to observe tags in the environment and estimate the relative poses between tags via landmark SLAM 2. Global Tag-Map Registration We then roughly align tags and a prior map by establishing tag-plane correspondences via graph-theoretic correspondence estimation 3. Estimation Refinement via Direct Camera-Map Alignment Tag and camera poses are refined by directly aligning agile camera images with the prior map and re-optimize all variables under all constraints
  8. Proposed Method 1. VIO-based Tag-Relative-Pose Estimation We use an agile

    camera to observe tags in the environment and estimate the relative poses between tags via landmark SLAM 2. Global Tag-Map Registration We then roughly align tags and a prior map by establishing tag-plane correspondences via graph-theoretic correspondence estimation 3. Estimation Refinement via Direct Camera-Map Alignment Tag and camera poses are refined by directly aligning agile camera images with the prior map and re-optimize all variables under all constraints
  9. VIO-based Tag-Relative-Pose Estimation • We use an agile camera and

    observe each tag in the environment at least once • The tag poses in the VIO frame is estimated via landmark SLAM VIO (VINS-Mono) Tag detections (Apriltags) Pose graph optimization
  10. Global Tag-Map Registration • We want to align the estimated

    tag poses with a prior 3D map without initial guess • The modality difference makes it difficult to apply image matching… Prior 3D map (sparse point cloud) Estimated tag poses (visually detected) Align w/o initial guess
  11. Geometry-based Tag-Plane Matching • We assume that most tags are

    placed on a plane in the environment • We establish tag-plane correspondences to determine the tag-map transformation Detecting planes in the environment 1. Region growing segmentation 2. RANSAC plane detection 3. Fit oriented BBoxes to plane points
  12. Geometry-based Tag-Plane Matching • We assume that most tags are

    placed on a plane in the environment • We establish tag-plane correspondences to determine the tag-map transformation Detecting planes in the environment 1. Region growing segmentation 2. RANSAC plane detection 3. Fit oriented BBoxes to plane points
  13. Geometry-based Tag-Plane Matching • We assume that most tags are

    placed on a plane in the environment • We establish tag-plane correspondences to determine the tag-map transformation Detecting planes in the environment 1. Region growing segmentation 2. RANSAC plane detection 3. Fit oriented BBoxes to plane points
  14. Geometry-based Tag-Plane Matching • We assume that most tags are

    placed on a plane in the environment • We establish tag-plane correspondences to determine the tag-map transformation Detecting planes in the environment 1. Region growing segmentation 2. RANSAC plane detection 3. Fit oriented BBoxes to plane points Plane = (center, normal, lengths)
  15. Max-Clique-based Correspondence Estimation • Tag-Plane Correspondence Consistency Graph Vertex: tag-plane

    correspondence hypothesis Edge: consistency between correspondence hypotheses ℎ𝑖𝑗 does not contradict ℎ𝑘𝑙 (i.e., they are consistent) Tag i corresponds to plane j Tag k corresponds to plane l ℎ𝑖𝑗 ℎ𝑘𝑙
  16. Max-Clique-based Correspondence Estimation • Tag-Plane Correspondence Consistency Graph Vertex: tag-plane

    correspondence hypothesis Edge: consistency between correspondence hypotheses ℎ𝑖𝑗 ℎ𝑘𝑙
  17. Max-Clique-based Correspondence Estimation • Tag-Plane Correspondence Consistency Graph Vertex: tag-plane

    correspondence hypothesis Edge: consistency between correspondence hypotheses • Largest subset of hypotheses that are all mutually consistent (i.e., maximum clique) gives the best explanation for the tag placement in the given map ℎ𝑖𝑗 ℎ𝑘𝑙
  18. Tag-Plane Correspondence Consistency • Consistency between tag-plane correspondence hypotheses is

    determined based on geometric consistency check ℎ𝑖𝑗 ℎ𝑘𝑙 Tag i Tag k Plane j Plane l
  19. Tag-Plane Correspondence Consistency • Consistency between tag-plane correspondence hypotheses is

    determined based on geometric consistency check • We align tag i and plane j and s.t. distance between tag k and plane l Plane j Plane l
  20. Tag-Plane Correspondence Consistency • Consistency between tag-plane correspondence hypotheses is

    determined based on geometric consistency check • We align tag i and plane j and s.t. distance between tag k and plane l • If normal and translation errors between tag k and plane l are smaller than threshold, these hypotheses are mutually consistent Plane j Plane l Normal error Translation error
  21. Example Result Planes Tags • While the consistency graph contains

    many edges, the max-clique can be found very efficiently [Rossi, 2015]
  22. Example Result Planes Tags Consistency graph contains 429,735 hypothesis pairs

    • While the consistency graph contains many edges, the max-clique can be found very efficiently [Rossi, 2015]
  23. Example Result Planes Tags Consistency graph contains 429,735 hypothesis pairs

    Maximum clique consists of 56 tag-plane correspondences found in 92 msec • While the consistency graph contains many edges, the max-clique can be found very efficiently [Rossi, 2015] • Given the tag-plane correspondences, we estimate the tag-map transformation by minimizing normal-to-normal ICP distance [Rusinkiewicz, 2019]
  24. Estimation Refinement • We refine the tag poses by directly

    aligning agile camera images with the map VIO Tag detections Pose graph Direct alignment
  25. Estimation Refinement • We refine the tag poses by directly

    aligning agile camera images with the map • We use the normalized information distance (NID), a mutual information-based cross modal metric, to maximize the co-occurrence of pixel and map intensity values • Tag and camera poses are re-optimized under all the constraints Agile camera image Map rendered with optimized camera pose
  26. Evaluation in Simulation • The method is evaluated on the

    Replica dataset [Savva, 2019] Global tag-map registration : 0.039m / 1.021° Tag localization accuracy : 98% success rate Baseline (FPFH+RANSAC/Teaser) : 26% and 70% Robustness to outlier tags
  27. Evaluation in Real Environment • 117 tags were placed in

    the environment • Tag poses were estimated in 22 minutes (16 min for VIO recording, 6 min for post processing) • Average tag pose error: 0.019m and 2.382° Final estimation result
  28. Conclusion • An accurate and scalable method for fiducial tag

    localization on a 3D prior environmental map is proposed • VIO-based tag relative pose estimation via landmark SLAM • Global tag-map registration based on tag-plane correspondence estimation via maximum clique finding • Estimation refinement via NID-based direct camera-map alignment • The proposed method could localize over 100 tags in 22 minutes • The average tag localization error was about 2 cm