Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apidays Singapore 2024 - Privacy Enhancing Tech...

Apidays Singapore 2024 - Privacy Enhancing Technologies for AI by Mark Choo, Federated Learning - AI Singapore

Privacy Enhancing Technologies for AI
Mark Choo, Head, Federated Learning - AI Singapore

Apidays Singapore 2024: Connecting Customers, Business and Technology (April 17 & 18, 2024)

------

Check out our conferences at https://www.apidays.global/

Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8

Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io

Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/

apidays

May 04, 2024
Tweet

More Decks by apidays

Other Decks in Technology

Transcript

  1. © 2024 AI Singapore The Future of Federated Learning Is

    there a case for Federated Learning(FL)? Since 2016, there are more than 3,900 papers on FL being published to arXiv alone. In recent years, there is a growing number of investments made into FL by the big tech companies. In March 2023, OECD published a paper on Emerging Privacy Enhancing Technologies(PET) and recognize FL as 1 of the 4 PETs categories. This sets the stage for FL to be adopted for data privacy protection as part of Privacy Design. https://www.oecd.org/publications/emerging-privacy- enhancing-technologies-bf121be4-en.htm
  2. © 2024 AI Singapore How federated learning works? 1. A

    group of parties (with local data) come together and form a network, with the common goal to train a model together. Party A Data Party B Data Party C Data
  3. © 2024 AI Singapore How federated learning works? 2. The

    Trusted Third Party (TTP) acts as the coordinator (it does not contribute data). It sends this model to all the other participating parties. This model would serve as a baseline for each individual party to start training with only local data. Party A Data Party B Data Party C Data Trusted Third Party Model
  4. © 2024 AI Singapore How federated learning works? 3. Each

    participating party will start to train the given model with its own local data. Party A Data Party B Data Party C Data Trusted Third Party Model Model Model Model W
  5. © 2024 AI Singapore How federated learning works? 3. Periodically,

    all parties send their learning (weights, gradients, losses, etc.) to the TTP. NO local data is ever exposed. Party A Data Party B Data Party C Data Trusted Third Party Model Model Model Model W
  6. © 2024 AI Singapore How federated learning works? 4. The

    TTP then aggregates the new learnings from the parties and continues to improve the shared model. Party A Data Party B Data Party C Data Trusted Third Party Model Model Model Model W
  7. © 2024 AI Singapore How federated learning works? 5. The

    new shared model is again sent back to the participating parties and the same cycle repeats again and again. With each iteration, the shared model maintained by the TTP gets better. Party A Data Party B Data Party C Data Trusted Third Party Model Model Model Model No raw data is received
  8. © 2024 AI Singapore Operational Impact of Data Collaboration What

    is the typical organization structure during Data Centralization? Raw Data Raw Data Raw Data Raw Data Data Processor Data Controller Organization 1 Data Processor Data Controller Organization 2 Third Party Organization Consolidator Storage
  9. © 2024 AI Singapore Operational Impact of Data Collaboration Structure

    1: Organization is both the Data Controller and Data Processor Raw Data Raw Data Data Processor Data Controller Organization 1 Consolidator Storage Organization 1 controls the full workflow Organization 1 of the federated grid is ultimately bounded by legalities regarding raw data since users’ personal data is • Processed • Stored • Organised • Collected • Disseminated Intra-organizational workflow • Regulations are between departments • Relatively easier/feasible to get access since contracts/request workflows/audits are all consistent within the organisation Data is often immobile due to data lakes
  10. © 2024 AI Singapore Operational Impact of Data Collaboration Structure

    2: Organization is the Data Controller and Third Party is the Data Processor Raw Data Raw Data Data Processor Data Controller Organization 2 Third Party Organization Consolidator Storage Inter-Organization Efforts Third Party processes the raw data when it receives operations from orchestrator, and is bounded by legalities regarding raw data since users’ personal data is • Processed • Stored • Organised Organization 2 controls the ingestion and usage of the raw data and hence is bounded by legalities regarding raw data since users’ personal data is • Collected • Stored • Organised • Disseminated
  11. © 2024 AI Singapore A Different Structure for Data Collaboration

    Federated Grid to Truly Retain Control of Private and Proprietary Data Raw Data Raw Data Federated Node Data Controller Organization 1 Federated Node Data Controller Organization 2 Third Party Organization Orchestrator Analytics | Mathematical Weights Analytics | Mathematical Weights Federated Grid
  12. © 2024 AI Singapore Federated Learning Beyond Privacy What other

    problems can Federated Learning solve? Data Immobility • Volume of Data can cause the duplication and transfer of data for centralized model training to be inefficient and expensive. ◦ For e.g., Medical Images can be very large even at individual level. ◦ For e.g., Financial Transaction Data where the business nature is data hungry. Collaborative AI • Veracity of Data can be improved because a party’s data may not contain the whole picture, but multiple party’s data can potentially improve the quality of the data ◦ For e.g., Multiple banks coming together to build a global fraud detection model. • Value & Variety of Data can be exploited more. ◦ For e.g., Supermarket and Hospital coming together to build a model to identify someone at risk of getting diabetes. Images generated with Microsoft Co-Pilot
  13. © 2024 AI Singapore Unlock Data with Federated Learning How

    can companies collaborate with their datasets? Sample ID Sample ID Features Features Overlapped Features Set (red box) Labels Dataset from A Dataset from B Labels Sample ID Sample ID Features Features Overlapped Sample Set (redbox) Labels Dataset from A Dataset from B Vertical Federated Learning Labels Horizontal Federated Learning • A and B are from different industry and the dataset use case is likely different. • There is an overlap of Sample set (row-wise). • Vertical FL allows training of a global model with a larger amount of data features using the overlapped sample set. • A and B are likely in the same industry and the dataset use case is the same with overlapping data features. • This allow horizontal FL to train a global model with the enlarged overlapped features set. (red box) • In other words, this means training a larger number of samples with the overlapped data features.
  14. © 2024 AI Singapore Centralized vs Federated Collaboration Cost Analysis

    of Centralized Collaborative AI vs Federated Learning Collaboration During training collaboration between different parties, the cost of centralization data can get hefty due to the data transfer and duplication of storage. * Data are taken from commercial cloud provider’s public pricing calculator * Data transfer is within the same cloud provider and between Asia region
  15. © 2024 AI Singapore Centralized vs Federated Collaboration Cost Analysis

    of Centralized Collaborative AI vs Federated Learning Collaboration 16 GB 161 GB 800 GB Since the collaborator’s cost is linear, the cost of central collaboration is approximately 5 times more than if the collaborator trains on its own. (No data duplication and no data transfer) Adopting Federated Learning as a collaboration method reduces the cost of collaboration as the dataset size scales.
  16. © 2024 AI Singapore Federated Learning for Public Good There

    is a lot more to learn if we can come together Individual organizations can only solve public good problem with their own data. But with data collaboration, we are increasing the solution space and benefit. In the example, each organization has 100 PB of data and through collaboration each organization now has 300 PB of data for learning. The total benefit will be 900 PB of data for public good.
  17. © 2024 AI Singapore Federated Learning for Public Good There

    is a lot more to learn if we can come together There is a scaling effect with the number of participants coming together to share knowledge. The more data we share, the more good we can potentially do.
  18. © 2024 AI Singapore AISG 100E Federated Learning Projects Examples

    of Federated Learning Use Case Federated Learning for ICU in-hospital mortality prediction https://link.springer.com/chapter/10.1 007/978-3-030-63076-8_18 A large multi-centre critical care database made available by Philips Healthcare in partnership with the MIT Laboratory for Computational Physiology were used. Data is anonymised and cleansed. Three hospitals with the most number of ICU stays, and most complete data were kept. A mortality prediction model was built for the research paper. Federated Predictive Maintenance for Telemetry Log Data A Global Data Infrastructure Service Provider wanted to improve the availability of services and minimize disruption for their customers by predicting when maintenance are required. However, the telemetry log data are housed at customer sites and centralizing the data are expensive and faces regulatory hurdles on data residency. The FL team implemented a Federated Learning solution that allows model customization. Federated Image Segmentation on Large OCT Scans A Global Pharmaceutical Company faces data privacy hurdles and high cost of data centralization for research. Medical images are highly sensitive and can be very large in size for even centralized ML training. The team customize & optimize Synergos with new engineering features that not only orchestrate image segmentation in a federated setting, but also to handle large medical data.