Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ABCS25: Full Steam Ahead: Engineering a Modern ...

ABCS25: Full Steam Ahead: Engineering a Modern Data Platform at Rhaetian Railway by Simon Schwab & Lukas Heusser

⭐️ Full Steam Ahead: Engineering a Modern Data Platform at Rhaetian Railway#
Discover how Rhaetian Railway is modernizing its data landscape using Azure Databricks, Terraform Infrastructure-as-Code, and Azure DevOps. We’ll explore how CI/CD pipelines streamline development, testing, and deployment across multiple environments, while a configuration file–driven approach brings flexibility and agility to data pipeline management. Learn about our key design principles, best practices for parameterizing data flows at scale, how we incorporate data quality checks to ensure reliable analytics, and the lessons we’ve learned on our journey toward a fully automated, high-performance data platform.
🙂 SIMON SCHWAB ⚡️ Senior Data & Analytics Consultant @ Swisscom
🙂 LUKAS HEUSSER ⚡️ Senior Data & AI Consultant @ Swisscom

Tweet

More Decks by Azure Zurich User Group

Transcript

  1. Full Steam Ahead: Engineering a Modern Data Platform at Rhaetian

    Railway 5. June 2025, Lukas Heusser & Simon Schwab Swisscom Data & AI Consulting
  2. 4 Drive Transformation Swisscom Data & AI Consulting Frame Explore

    Realize Scale Our mission is to help our customers fully exploit the potential of their data. To this end, we design and implement data-based, analytical systems that sustainably improve their core business.
  3. 5 About us Lukas Heusser Senior Data & AI Consultant

    lukas.heusser1@swisscom.com +41 79 549 78 72 «Implementing data and analytics solutions, with a focus on Databricks, Snowflake, and Azure» • Swisscom AG B2B Data & AI Consulting Unit • BSc. Business Information Technology • Certified in Databricks, Snowflake & Azure Ein Bild, das Logo, Screenshot, Grafiken, Symbol
  4. 6 About us Simon Schwab Senior Data & AI Consultant

    simon.schwab1@swisscom.com +41 79 840 35 69 «Swiss data and cloud professional with a passion for designing and implementing modern data platforms» • Swisscom AG B2B Data & AI Consulting Unit • MSc. Business Information Technology • Certified in Azure, AWS , Databricks & Project Management Ein Bild, das Logo, Screenshot, Grafiken, Symbol
  5. 7 Operating the existing Azure data platform Improving user interaction

    with the data and data architecture to enable use case development Developing and running a data platform for analytics and reporting Rhaetian Railway Customer Modernizing RhB’s data landscape Coming from an Azure-based data platform, RhB faced architectural and operational challenges that limited scalability and transparency. After evaluating the existing setup, Swisscom Data & AI Consulting proposed a greenfield approach with Azure Databricks to modernize the data landscape and address core design issues.
  6. 8 Running the Bernina and Glacier Express 102 Train stations

    and stops 385 km of track length with > 1000 vehicles > 15 Mio. Passengers and 4 Mio. Commuters Rhaetian Railway at a glance
  7. 9 Project Setup Data Architecture Data Sources BI IT Infrastructure

    Data Platform Data Engineering Since 2024 Operations Advisory of Advisory of Data Science Since 2024
  8. Challenges with the existing solution • No separation between development

    and production environment • Long and tedious implementation time for new use cases, primarily caused by a lack of structure • Overall data architecture was not in focus • Long data loading times taking sometimes more than 5 hours to complete • Transformations already applied during data ingestion • Many assumptions, such as data types, this led to frequent, unnecessary errors • Capacity problems with SQL server, which was solved by a temporal upscaling
  9. Since 2024 Source of the challenges • Requirements and communication

    were unclear • Testing and test definitions were neglected (under pressure) • Implementation requires domain knowledge • (Too) fast reverse engineering of data sources due to missing or insufficient documentation • Time pressure that demanded compromises
  10. 16 Infrastructure as Code RhB IT Providing a basic landing

    zone with network connectivity and managing the Entra ID Setup Of the needed resources for Terraform 1 CI/CD Pipelines To automate the validation and deployment of the IaC 3 Deployment Of the platform itself with all services and permissions 2 Development of use cases Building the data lakehouse and realizing use cases on data 4
  11. 17 Ingestion/Landing as a layer • Azure Data Factory as

    a central ingestion tool • Storing data in the landing layer • Parquet files are stored in Storage Account • Orchestration through Databricks Workflows • After ingestion, data is processed with Databricks to the bronze layer • Data is moved to archive after 30 days
  12. 18 Our development setup Visual Studio Code with Databricks Extension

    for the local development of the Databricks assets (e.g. pipelines, jobs, custom python packages etc.) Azure DevOps for version control of delivery objects and CI/CD between dev, testing and production environments Databricks including Asset Bundles for code-based definition of Databricks assets and resources and to execute dev and testing resources
  13. 19 Databricks Asset Bundles What it is • Assets (Jobs,

    Compute, Notebooks) • Definition in .yml • Tool is built on Terraform (similar workflow) Why we use it • Helps with automating (CI/CD pipelines) • Consistent deployment • Better collaboration
  14. 20 Databricks Asset Bundles Example Azure DevOps Pipelines .yml Databricks

    Job definiton Dev deployment of Databricks Job Prod deployment of Databricks Job
  15. 21 Deployment Pipelines feature branch Asset Bundle • databricks.yml •

    Notebooks • Pipelines • Other resources release branch Asset Bundle • databricks.yml • Notebooks • Pipelines • Other resources main branch Asset Bundle • databricks.yml • Notebooks • Pipelines • Other resources Deployment Pipeline Deployment Pipeline
  16. 22 End-2-End Data Orchestration Trigger Azure Data Factory Load SAP

    Source System n Load Operational Data Clean & Transform data Load use case specific models Data Quality checks Landing Bronze Silver Gold Databricks Workflows
  17. 23 Metadata driven Framework Entity configuration { "entity": "Person", "target_table":

    "person", "source_system": "sap", "source_tables": ["ZBI_I_PA0002"], "scd2_key_columns": ["PersonNummerId"], "scd2_table": "silver.person" } column_name_old,column_name_new,data_type PERNR,PersonNummerId,INTEGER SUBTY,Subtyp,STRING OBJPS,ObjektIdentifikation,STRING Entity config Source-2-target mapping > Parametrized pipeline ... catalog = dbutils.widgets.get("catalog_silver") schema = dbutils.widgets.get("schema_silver") domain = dbutils.widgets.get("domain_silver") entity = dbutils.widgets.get("entity_silver") # load entities config per domain from the config json path_entities_config = f"file:{path}/{domain}/entities.json" entities_config = spark.read.option("multiline", "true").json(path_entities_config) # check if the current entity is defined in the config entity_config = ( entities_config.filter(entities_config.entity == entity) .collect() ) main_notebook.py (snippet of parameter initialization) Custom Transformations Custom transformations per entity
  18. Benefits and Lessons Learned What we are proud of and

    what we could have done differently
  19. 25 Faster use case development (Analytics & Reporting), from months

    to days Enabling RhB data team to realize use cases on their own (Analytics & Data Science) Understanding what actually happens with their data in the platform Rhaetian Railway Benefits Lessons learned • Overlap with Terraform and Databricks asset bundles • 2 instead of 3 environments does not really reduce overhead • Efficiency through automation and parameterization – applied wisely, not blindly • Evaluation of Azure Data Factory as Ingestion tool