When working in embedded and observability domains, I’ve used Python scripting to retrieve and pre-process data from external sources, and one of the issues I’ve seen is the difficulty to reliably test data pipelines against external services: API limits and pay-per-use costs, service outages, etc, etc. So, can we model (aka “mock”) the services to reliably test our data ingestion pipelines?. Sure we can!
In this talk I will show a few ways to build test services, databases and API providers with the help of Testcontainers and WireMock available on Python, thanks to container tech. Then, we will extend the approach by adding the generation of fake data with help of Faker libraries or Synthesized that can be used for both relational data and data sequences.