Upgrade to Pro — share decks privately, control downloads, hide ads and more …

cjdb: a simple, fast, and lean database solutio...

Hugo Ledoux
September 14, 2023

cjdb: a simple, fast, and lean database solution for the CityGML data model

Presentation about cjdb (https://github.com/cityjson/cjdb) made at the 3DGeoInfo conference in Munich on 2023-09-14.

Paper there: https://arxiv.org/abs/2307.06621

Hugo Ledoux

September 14, 2023
Tweet

More Decks by Hugo Ledoux

Other Decks in Research

Transcript

  1. Leon Powałka, Chris Poon, Yitong Xia, Siebren Meines, Lan Yan,

    Yuduan Cai, Gina Stavropoulou, Balázs Dukai, and Hugo Ledoux TU Delft + 3DGI.xyz cjdb a simple, fast, and lean database solution for the CityGML data model 3DGeoInfo 2023 Munich, Germany 2023-09-14
  2. Leon Powałka, Chris Poon, Yitong Xia, Siebren Meines, Lan Yan,

    Yuduan Cai, Gina Stavropoulou, Balázs Dukai, and Hugo Ledoux TU Delft + 3DGI.xyz cjdb a simple, fast, and lean database solution for the CityGML data model 3DGeoInfo 2023 Munich, Germany 2023-09-14
  3. is liked and used because it: 3 • is a

    database (duh!) • implements the CityGML data model • can be used with PostgreSQL and Oracle • is an ecosystem: importer, exporter, web-viewer (Cesium), web-server • o ff ers export to several formats
  4. BUT 3DCityDB has a very complex data model 4 •

    66 tables! • nested tables • attributes scattered across di ff tables • ==> complex queries It abstracts some the complexity with views, but size of database increases even more Full details in next presentation! Stay seated!
  5. Student project at TUDelft: “Just fix this! Oh, and you

    have 9 weeks.” • == simplified+compacter encoding • Our idea was put CityJSON directly in the database • Make use of PostgreSQL type jsonb • And our intuition was that it would be better • At least for a viewer + web-server for exporting files ~7X compacter
  6. CityJSON == CityGML without the GML 6 Data model GML

    encoding 3 encodings 3DCityDB { "type": “CityJSON", "version": “1.1”, "transform": { "scale": [1.0, 1.0, 1.0], "translate": [0.0, 0.0, 0.0] }, "metadata": { "referenceSystem": "https://www.opengis.net/def/crs/EPSG/0/7415" }, "CityObjects": { "id-1": { "type": "Building", "attributes": { "measuredHeight": 22.3, "owner": “Elvis Presley" }, "geometry": [ { "type": “MultiSurface", "lod": “2.1", "boundaries": [ [[0, 3, 2, 1]], [[4, 5, 6, 7]], [[0, 1, 5, 4]] ] } ] } }, "vertices": [ [231, 2321, 11], [1111, 321, 12], ... ], "appearance": { "materials": [], "textures":[] } }
  7. cjdb (CityJSON database) 8 PostgreSQL database schema Python importer+exporter city_object

    + id: integer + type: text + object_id: text + attributes: jsonb + geometry: jsonb + ground_geometry: geometry city_object_ relationships cj_metadata + id: integer + version: text + source_file: text + metadata: jsonb + transform: jsonb + srid: integer + extensions: jsonb + extra_properties: jsonb + geometry_templates: jsonb + bbox: geometry + started_at: timestamp + finished_at: timestamp cj_metadata_id 0..* 1 0..* 1..*
  8. 1. CityGML model is fl attened 9 city_object + id:

    integer + type: text + object_id: text + attributes: jsonb + geometry: jsonb + ground_geometry: geometry city_object_ relationships cj_metadata + id: integer + version: text + source_file: text + metadata: jsonb + transform: jsonb + srid: integer + extensions: jsonb + extra_properties: jsonb + geometry_templates: jsonb + bbox: geometry + started_at: timestamp + finished_at: timestamp cj_metadata_id 0..* 1 0..* 1..* city_object city_object_relationship
  9. 2. CityJSON Lines is used (geometries have local coordinates) 10

    Fig. 1. UML diagram of cjdb. 1 [ 2 { 3 "type ": "Solid", 4 "lod ": "2.2" , 5 " boundaries ": [ 6 [ [[ [11.1 , 22.6 , 9.9] , [16.21 , 42.8 , 19.9] , ... ] 7 ], 8 "semantics ": { 9 "surfaces" : [ 10 { "type ": " RoofSurface " }, 11 { "type ": " WallSurface " }, 12 ... 13 ], 14 "values ": [ [0, 1, ...] ] 15 } 16 } 17 ] 18 Fig. 2. Example snippet stored in the ‘geometry’ column: an array of CityJSON ge- ometries. 3D city models are 2D queries (all buildings inside a given area, within a given distance, etc). 3 Data model, software, and engineering decisions 3.1 Data model As shown in Figure 1, the cjdb data model is simple and akin to using the Simple Feature paradigm (OGC, 2006), as PostGIS does. Each row in the table vertices are de-referenced jsonb stored directly in a column CityJSONL array because 1+ geometries
  10. 3. 2D footprints of Buildings is extracted and stored with

    PostGIS type 11 2D queries (very frequent!) are speed up
  11. 4. Importer + exporter are Python-based 12 Query to select

    exported features https://github.com/cityjson/cjdb & pip install cjdb
  12. Benchmark with 3 different (open) datasets 13 8 Powa lka

    et al. Table 1. The 3 datasets used for the benchmark. # Building # BuildingPart LoDs present # attributes 3DBAG 112 673 110 387 0/1.2/1.3/2.2 30 NYC 23 777 0 2 3 Vienna 307 1015 2 7 Table 2. Import and export times, from/to CityJSONL. All times in seconds. 3DCityDB cjdb import export import export 3DBAG 6780 721 1260 412 NYC 273 161 23 25 Vienna 12 7 2 2.5 4.1 Import and export times We compared the import time for all 3 datasets and we found that cjdb is considerably faster than 3DCityDB. As an example, the 100 tiles of the 3DBAG were imported in 21 min in cjdb whereas it took 113 min with 3DCityDB; see Table 2 for all details. This is expected, since the storage in the cjdb database is
  13. Import/export from/to CityJSONL 15 mport and export times, from/to CityJSONL.

    All times in 3DCityDB cjdb import export import export 3DBAG 6780 721 1260 412 NYC 273 161 23 25 Vienna 12 7 2 2.5 all times in seconds
  14. Import/export from/to CityJSONL 15 mport and export times, from/to CityJSONL.

    All times in 3DCityDB cjdb import export import export 3DBAG 6780 721 1260 412 NYC 273 161 23 25 Vienna 12 7 2 2.5 5X 12X 6X 1.8X 6.4X 2.8X all times in seconds
  15. Database sizes (in MB) 16 together in the ‘geometry’ column.

    Table 3. Database size comparison for 100 tiles of the 3DBAG dataset, all values in MB. 3DCityDB cjdb tables indexes TOAST total tables indexes TOAST total 3DBAG 5463 4322 112 9898 257 57 755 1070 NYC 590 735 0.5 1326 26 4 25 54 Vienna 30 42 0.5 73 1.5 0.5 4 6 all values in MB
  16. Database sizes (in MB) 16 together in the ‘geometry’ column.

    Table 3. Database size comparison for 100 tiles of the 3DBAG dataset, all values in MB. 3DCityDB cjdb tables indexes TOAST total tables indexes TOAST total 3DBAG 5463 4322 112 9898 257 57 755 1070 NYC 590 735 0.5 1326 26 4 25 54 Vienna 30 42 0.5 73 1.5 0.5 4 6 9X 25X 12X all values in MB
  17. 8 typical queries 17 dataset are listed in Appendix A;

    similar queries were used for the other 2 datasets. Table 4. The 8 queries we used for the benchmark. Q1 Retrieve the ids of all buildings based on one attribute (roof height higher than 20 m) Q2 Retrieve all buildings within a 2D bounding box Q3 Retrieve building intersecting with a 2D point Q4 Retrieve the number of parts for each building Q5 Retrieve all buildings having a specific LoD geometry Q6 Add new ‘footprint area’ attribute Q7 Update ‘footprint area’ attribute by adding 10m Q8 Delete ‘footprint area’ attribute Q1. Query based on attributes: 3DCityDB o↵ers a list of predefined build-
  18. cjdb database 8 typical queries 18 datasets. Table 4. The

    8 queries we used for the benchmark. Q1 Retrieve the ids of all buildings based on one attribute (roof height higher than 20 m) Q2 Retrieve all buildings within a 2D bounding box Q3 Retrieve building intersecting with a 2D point Q4 Retrieve the number of parts for each building Q5 Retrieve all buildings having a specific LoD geometry Q6 Add new ‘footprint area’ attribute Q7 Update ‘footprint area’ attribute by adding 10m Q8 Delete ‘footprint area’ attribute Q1. Query based on attributes: 3DCityDB o↵ers a list of predefined build- ing attributes within the building table, which include ‘year of construction’ and ‘roof type’—attributes that are not in this list are stored in the table cityobject genericattrib. Cjdb on the other hand o↵ers more flexibility since all the attributes remain in JSON format in the attributes column, regardless of the attribute name. Since none of our datasets have attributes from the 3DCityDB’s predefined list, we decided to compare the attribute-based data retrieval for both databases based on non-listed attributes. In this specific example, we queried all the build- ings with roof height (‘h dak max’ for BAG ‘HoeheDach’ for Vienna) higher than 20 m. The New York dataset was not taken into account for this query, since there is no specific attribute about the roof height. For cjdb no join is necessary since the attributes are stored together with the city object but the equivalent in 3DCityDB requires a join between the city object and the cityobject genericattrib tables. As shown in Table 5, cjdb it is faster than 3DcityDB for Vienna but performs almost the same as
  19. Latest version is 2.0, we promise to continue its development

    19 TODOs 1. Builtin functions to extract semantic surfaces + 3D geometries (eg in PostGIS- SFCGAL) 2. QGIS support 3. Multi-threaded export 4. Textures + materials 5. Update for upcoming CityJSON v2.0 🚀
  20. Latest version is 2.0, we promise to continue its development

    19 TODOs 1. Builtin functions to extract semantic surfaces + 3D geometries (eg in PostGIS- SFCGAL) 2. QGIS support 3. Multi-threaded export 4. Textures + materials 5. Update for upcoming CityJSON v2.0 🚀 OGC just unofficially approved CityJSON v2.0 { 🌳 🏢 🏠 } Same as v1.1 with a few key things fixed