ABCS25: Full Steam Ahead: Engineering a Modern Data Platform at Rhaetian Railway by Simon Schwab & Lukas Heusser

Full Steam Ahead: Engineering a Modern Data Platform at Rhaetian
Railway 5. June 2025, Lukas Heusser & Simon Schwab Swisscom Data & AI Consulting

1. Introduction 2. Requirements and Challenges 3. New Platform 4.
Benefits 5. Questions Agenda

Introduction Who we are and why we’re here

4 Drive Transformation Swisscom Data & AI Consulting Frame Explore
Realize Scale Our mission is to help our customers fully exploit the potential of their data. To this end, we design and implement data-based, analytical systems that sustainably improve their core business.

5 About us Lukas Heusser Senior Data & AI Consultant
[email protected] +41 79 549 78 72 «Implementing data and analytics solutions, with a focus on Databricks, Snowflake, and Azure» • Swisscom AG B2B Data & AI Consulting Unit • BSc. Business Information Technology • Certified in Databricks, Snowflake & Azure Ein Bild, das Logo, Screenshot, Grafiken, Symbol

6 About us Simon Schwab Senior Data & AI Consultant
[email protected] +41 79 840 35 69 «Swiss data and cloud professional with a passion for designing and implementing modern data platforms» • Swisscom AG B2B Data & AI Consulting Unit • MSc. Business Information Technology • Certified in Azure, AWS , Databricks & Project Management Ein Bild, das Logo, Screenshot, Grafiken, Symbol

7 Operating the existing Azure data platform Improving user interaction
with the data and data architecture to enable use case development Developing and running a data platform for analytics and reporting Rhaetian Railway Customer Modernizing RhB’s data landscape Coming from an Azure-based data platform, RhB faced architectural and operational challenges that limited scalability and transparency. After evaluating the existing setup, Swisscom Data & AI Consulting proposed a greenfield approach with Azure Databricks to modernize the data landscape and address core design issues.

8 Running the Bernina and Glacier Express 102 Train stations
and stops 385 km of track length with > 1000 vehicles > 15 Mio. Passengers and 4 Mio. Commuters Rhaetian Railway at a glance

9 Project Setup Data Architecture Data Sources BI IT Infrastructure
Data Platform Data Engineering Since 2024 Operations Advisory of Advisory of Data Science Since 2024

Requirements and Challenges Or why we decided to build the
platform from the ground up

11 Existing Data Platform Architecture

Challenges with the existing solution • No separation between development
and production environment • Long and tedious implementation time for new use cases, primarily caused by a lack of structure • Overall data architecture was not in focus • Long data loading times taking sometimes more than 5 hours to complete • Transformations already applied during data ingestion • Many assumptions, such as data types, this led to frequent, unnecessary errors • Capacity problems with SQL server, which was solved by a temporal upscaling

Since 2024 Source of the challenges • Requirements and communication
were unclear • Testing and test definitions were neglected (under pressure) • Implementation requires domain knowledge • (Too) fast reverse engineering of data sources due to missing or insufficient documentation • Time pressure that demanded compromises

New Platform Our approach to build a unified data platform
on Azure Databricks

15 Simplified Architecture

16 Infrastructure as Code RhB IT Providing a basic landing
zone with network connectivity and managing the Entra ID Setup Of the needed resources for Terraform 1 CI/CD Pipelines To automate the validation and deployment of the IaC 3 Deployment Of the platform itself with all services and permissions 2 Development of use cases Building the data lakehouse and realizing use cases on data 4

17 Ingestion/Landing as a layer • Azure Data Factory as
a central ingestion tool • Storing data in the landing layer • Parquet files are stored in Storage Account • Orchestration through Databricks Workflows • After ingestion, data is processed with Databricks to the bronze layer • Data is moved to archive after 30 days

18 Our development setup Visual Studio Code with Databricks Extension
for the local development of the Databricks assets (e.g. pipelines, jobs, custom python packages etc.) Azure DevOps for version control of delivery objects and CI/CD between dev, testing and production environments Databricks including Asset Bundles for code-based definition of Databricks assets and resources and to execute dev and testing resources

19 Databricks Asset Bundles What it is • Assets (Jobs,
Compute, Notebooks) • Definition in .yml • Tool is built on Terraform (similar workflow) Why we use it • Helps with automating (CI/CD pipelines) • Consistent deployment • Better collaboration

20 Databricks Asset Bundles Example Azure DevOps Pipelines .yml Databricks
Job definiton Dev deployment of Databricks Job Prod deployment of Databricks Job

21 Deployment Pipelines feature branch Asset Bundle • databricks.yml •
Notebooks • Pipelines • Other resources release branch Asset Bundle • databricks.yml • Notebooks • Pipelines • Other resources main branch Asset Bundle • databricks.yml • Notebooks • Pipelines • Other resources Deployment Pipeline Deployment Pipeline

22 End-2-End Data Orchestration Trigger Azure Data Factory Load SAP
Source System n Load Operational Data Clean & Transform data Load use case specific models Data Quality checks Landing Bronze Silver Gold Databricks Workflows

23 Metadata driven Framework Entity configuration { "entity": "Person", "target_table":
"person", "source_system": "sap", "source_tables": ["ZBI_I_PA0002"], "scd2_key_columns": ["PersonNummerId"], "scd2_table": "silver.person" } column_name_old,column_name_new,data_type PERNR,PersonNummerId,INTEGER SUBTY,Subtyp,STRING OBJPS,ObjektIdentifikation,STRING Entity config Source-2-target mapping > Parametrized pipeline ... catalog = dbutils.widgets.get("catalog_silver") schema = dbutils.widgets.get("schema_silver") domain = dbutils.widgets.get("domain_silver") entity = dbutils.widgets.get("entity_silver") # load entities config per domain from the config json path_entities_config = f"file:{path}/{domain}/entities.json" entities_config = spark.read.option("multiline", "true").json(path_entities_config) # check if the current entity is defined in the config entity_config = ( entities_config.filter(entities_config.entity == entity) .collect() ) main_notebook.py (snippet of parameter initialization) Custom Transformations Custom transformations per entity

Benefits and Lessons Learned What we are proud of and
what we could have done differently

25 Faster use case development (Analytics & Reporting), from months
to days Enabling RhB data team to realize use cases on their own (Analytics & Data Science) Understanding what actually happens with their data in the platform Rhaetian Railway Benefits Lessons learned • Overlap with Terraform and Databricks asset bundles • 2 instead of 3 environments does not really reduce overhead • Efficiency through automation and parameterization – applied wisely, not blindly • Evaluation of Azure Data Factory as Ingestion tool

26 Questions

Thank you for listening

ABCS25: Full Steam Ahead: Engineering a Modern ...

ABCS25: Full Steam Ahead: Engineering a Modern Data Platform at Rhaetian Railway by Simon Schwab & Lukas Heusser

Azure Zurich User Group PRO

More Decks by Azure Zurich User Group

Featured

Transcript

Full Steam Ahead: Engineering a Modern Data Platform at Rhaetian

1. Introduction 2. Requirements and Challenges 3. New Platform 4.

Introduction Who we are and why we’re here

4 Drive Transformation Swisscom Data & AI Consulting Frame Explore

5 About us Lukas Heusser Senior Data & AI Consultant

6 About us Simon Schwab Senior Data & AI Consultant

7 Operating the existing Azure data platform Improving user interaction

8 Running the Bernina and Glacier Express 102 Train stations

9 Project Setup Data Architecture Data Sources BI IT Infrastructure

Requirements and Challenges Or why we decided to build the

11 Existing Data Platform Architecture

Challenges with the existing solution • No separation between development

Since 2024 Source of the challenges • Requirements and communication

New Platform Our approach to build a unified data platform

15 Simplified Architecture

16 Infrastructure as Code RhB IT Providing a basic landing

17 Ingestion/Landing as a layer • Azure Data Factory as

18 Our development setup Visual Studio Code with Databricks Extension

19 Databricks Asset Bundles What it is • Assets (Jobs,

20 Databricks Asset Bundles Example Azure DevOps Pipelines .yml Databricks

21 Deployment Pipelines feature branch Asset Bundle • databricks.yml •

22 End-2-End Data Orchestration Trigger Azure Data Factory Load SAP

23 Metadata driven Framework Entity configuration { "entity": "Person", "target_table":

Benefits and Lessons Learned What we are proud of and

25 Faster use case development (Analytics & Reporting), from months

26 Questions

Thank you for listening