Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DGIQ West 2025 Karen Lopez Data Frameworks

DGIQ West 2025 Karen Lopez Data Frameworks

Avatar for Karen Lopez

Karen Lopez

May 07, 2025

More Decks by Karen Lopez

Other Decks in Technology

Transcript

  1. Karen Lopez Microsoft MVP, Data Platform Microsoft Certified Trainer, vExpert

    Data management expert, space enthusiast, and #TeamData evangelist www.datamodel.com @datachick
  2. 5

  3. 6

  4. 2025 Challenges for Data Management Strategies Data Volume and Complexity

    Real-time Access Data Privacy & Compliance Data Security New Data Architectures & Frameworks AI & ML
  5. Data Management Complexity Data quality Data platform trends Multi-platforms Security

    & protection Cloud + on prem + hybrid architectures Lack of governance Too much governance Duplication Integration Data movement & gravity Data lifecycles Data literacy People
  6. More focus on AI and Data Artificial intelligence (AI) is

    growing in popularity as a tool for data analysis. Machine learning, a subset of AI, is used to make predictions and decisions based on data. Neural networks, another subset of AI, apply pattern recognition to data. There is a growing need for ethical considerations and responsible use of data. 11
  7. Great Focus on Security, Privacy, and Compliance Data privacy is

    becoming increasingly important, with regulations placing greater emphasis on data protection. There is a growing need for data security due to increasing cyber attacks and data breaches. Compliance with regulations and standards is crucial for avoiding legal and financial penalties.
  8. What Do I Mean by Data Framework Patterns of Architecture

    Governance Specific Goals Modern Infrastructure More focused on Analytics
  9. What Do I Mean by Data Framework Logical and Physical

    Processes and Policies Standards Integration Vendor Agnostic
  10. Please give me a short definition of “data framework;” Sure!

    A data framework is a structured system or approach used to organize, manage, and utilize data efficiently, ensuring it can be accessed and analyzed effectively for decision- making 15
  11. Data Framework Trends Greater Focus on Security, Privacy, and Compliance

    More focus data separate from applications Demands for Better Data Trust Data Architectures
  12. Frameworks, sort of Data Fabric Data Lake Data Mesh Zachman

    Framework Data Vault DW (Star, Snowflake) Big Data 17
  13. Understanding Data Mesh Decentralized data Domain-oriented ownership of data Self-serve

    data infrastructure Cross-functional teams 18 Data Products Data accessibility focus Data Quality focus
  14. Data Products • Goal of Data Mesh • Decentralized ownership

    and governance of that data • New mindset • New culture • No central governance 20
  15. Key Data Mesh Processes 23 Data Product Creation: Domains produce

    well- documented, discoverable, and high-quality data products. Interoperability: common standards, metadata conventions, and API-based access for seamless data integration. Decentralized Data Processing: Data processing responsibilities are distributed across domains to ensure scalability. Automated Governance Observability: Governance policies are enforced using automation, ensuring data compliance and monitoring.
  16. Implications of Data Decentralization Governance by Data Product Data Product

    owners need data professionals and engineers Their own preferred tools? Decentralization helps subset of org to work faster More difficult integration Data movement 24
  17. More Difficult Parts Complexity in Governance Data Quality Variability oversight

    Integration Challenges Increased Operational Costs Skill / Culture Shift 25
  18. Data Lakes 26 Open-source Formats Schema on Read, not Write

    Scalable Structured Data Unstructured Data Semi-structured Data Non- transactional Difficult to Govern Data Quality up to many people
  19. 27 A data lake is a system or repository of

    data stored in its natural/raw format, usually object blobs or files. A data lake is usually a single store of data including raw copies of source system data, sensor data, social data etc., and transformed data used for tasks such as reporting, visualization, advanced analytics, and machine learning. A data lake can include structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs), and binary data (images, audio, video)] A data lake can be established on premises (within an organization's data centers) or in the cloud (using cloud services). Wikipedia contributors. (2025, March 14). Data lake. In Wikipedia, The Free Encyclopedia. Retrieved 19:13, April 10, 2025, from https://en.wikipedia.org/w/index.php?title=Data_lake&oldid=1280454172
  20. Bill Inmon on Data Lakes Not a fan & writes

    books about them “Very few can turn the data lake into an information gold mine. Most wind up with garbage dumps.” 29
  21. Data Lakehouse Combine the best parts of data warehouse and

    data lakes Multiple formats Transactional support Schema on write Data integrity & quality Data catalog 31
  22. Implications of Data Lakes Data isn’t hidden away Data isn’t

    formatted for just one use Schema-free data also documentation- free? Data Swampiness Often too large to govern well 33
  23. Data Fabric Architectures Data Fabric: architecture where data is consistently

    available and accessible to applications Data Fabric allows for a unified view of decentralized data Data Fabric is a key component of many analytics and AI initiatives Moves much data management professionals to business units. 34
  24. Microsoft Fabric (a Platform) Databases Data Warehouse Storage Data Lake

    Power BI Data Factory Realtime Intelligence Data Engineering Data Science 37
  25. 38

  26. Implications of Data Fabric Data isn’t hidden away Data isn’t

    formatted for just one use Schema + Raw data + processes Data Redundancy? Easier to enforce policies and standards 39
  27. Data Archtecture Opportunities Using AI for Data Management Data-driven Projects

    Will Increase Demand Hybrid Cloud Projects No Code/Low Code Technologies
  28. One more time… Every Design Decision must be based on

    Cost, Benefit and Risk www.datamodel.com
  29. Karen is happy about the challenges Just like in space

    exploration, the universe is made of data 44