Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Manojit Nandi - Measures and Mismeasures of alg...

Manojit Nandi - Measures and Mismeasures of algorithmic fairness

Within the last few years, researchers have come to understand that machine learning systems may display discriminatory behavior with regards to certain protected characteristics, such as gender or race. To combat these harmful behaviors, we have created multiple definitions of fairness to enable equity in machine learning algorithms. In this talk, I will cover these different definitions of algorithmic fairness and discuss both the strengths and limitations of these formalizations. In addition, I will cover other best practices to better mitigate the unintended bias of data products.

https://us.pycon.org/2019/schedule/presentation/226/

PyCon 2019

May 04, 2019
Tweet

More Decks by PyCon 2019

Other Decks in Programming

Transcript

  1. About Me (According to Google Cloud Vision API) • Dancer?

    Aerial dancer and circus acrobat. • Entertaining? Hopefully. • Fun? Most of the time. • Girl?!?
  2. Algorithmic Fairness • Algorithmic Fairness is a growing field of

    research that aims to mitigate the effects of unwarranted bias/discrimination on people in machine learning. • Primarily focused on mathematical formalisms of fairness and developing solutions for these formalisms. • IMPORTANT: Fairness is inherently a social and ethical concept. Source: Fairness and Abstraction in Socio-technical Systems; Selbst, boyd, Friedler, Venkatasubramanian & Vertesi (2018)
  3. BuT mAtH cAn’T bE rAcist!! • No one is sincerely

    arguing that mathematics or computer science is inherently discriminatory. • However, the way people apply mathematical models or algorithms to real-world problems can reinforce societal inequalities.
  4. Fairness, Accountability, Transparency (FAT*) ML • Interdisciplinary research area that

    focuses on creating machine-learning systems that work towards goals, such as fairness and justice. • Many open-source libraries (FairTest, thesis-ml, AI 360) developed based on this research. • ACM FAT* 2019 Conference held in Atlanta, GA back in January. Photo credits: Moritz Hardt
  5. Legal Regulations In the United States, many industries have legal

    regulations to prevent disparate impact against vulnerable populations. • Education (Education Amendments Act) • Employment (Civil Rights Act) • Credit (Equal Credit Opportunity Act) • Housing (Fair Housing Act)
  6. Types of Algorithmic Biases Kate Crawford Hanna Wallach Solan Barocas

    Aaron Shapiro Microsoft Research Microsoft Research Cornell University Microsoft Research
  7. Bias in Allocation • Most commonly researched family of algorithmic

    fairness problem (why we invented the math definitions). • Algorithmic Idea: How do models perform in binary classification problems across different groups? • Fundamental Idea: When allocating finite resources (credit loans, gainful employment), we often favor the privileged class over the more vulnerable. Source: Reuters News
  8. Bias in representation • Focused on looking at how harmful

    labels/representations are propagated. • Often related to language and computer vision problems. • Harder to quantify error compared to bias in allocation problems.
  9. Weaponization of Machine Learning • As data scientist, we are

    often not taught to think about how models could be used inappropriately. • With the increasing usage of AI in high-stakes situations, we must be careful not to harm vulnerable populations. Source: Why Stanford Researcher tried to Create A “Gaydar” Machine; New York Times
  10. “21 Definitions of Algorithmic Fairness” • There are more than

    30 different mathematical definitions of fairness in the academic literature. • There isn’t a one, true definition of fairness. • These definitions can be grouped together into three families: ◦ Anti-Classification ◦ Classification Parity ◦ Calibration Pictured: Princeton CS professor, Arvind Narayanan
  11. Anti-Classification • Heuristic: Algorithmic decisions “ignore” protected attributes. (Individual Fairness)

    • In addition to excluding protected attributes, one must also be concerned about learning proxy features. • Useful for defining loss function of fairness-aware models. Same Outcome “Unprotected” features
  12. Fairness-Aware Algorithms • Given a set of features X, labels

    Y, and protected characteristics Z, we want to create a model that learns to predict the labels Y, but also doesn’t “accidentally” learn to predict the protected characteristics Z. • Can view this constrained optimization as akin to regularization. Sometimes referred to as accuracy-fairness trade-off. Source: Towards Fairness in ML with Adversarial Networks (GoDrivenData) Is good classifier? Learning protected attributes?
  13. Dangers of Anti-Classification Measures • By “removing” protected features, we

    ignore the underlying processes that affect different demographics. • Fairness metrics are focused on making outcomes equal. • DANGER! Sometimes making outcomes equal adversely impacts a vulnerable demographic. Source: Corbett-Davies, Goel (2019)
  14. Classification Parity • Given some traditional classification measure (accuracy, false

    positive rate), is our measure equal across different protected groups. (Group Fairness) • Most commonly used to audit algorithms from a legal perspective. Source: Gender Shades, Buolamwini & Gebru (2018)
  15. Demographic Parity • Demographic Parity looks at the proportion of

    positive outcomes by protected attribute group. • Demographic Parity is used to audit models for disparate impact (80% rule). • DANGER! Satisfying immediate constraint may have potential negative long-term consequences. Source: Delayed Impact of Fair Machine Learning, Liu et. al (2018)
  16. Parity of False Positive Rates • As the name suggest,

    this measures looks at false positive rate across different protected groups. • Sometimes called “Equal Opportunity” • It’s possible to have improve false positive rate by increasing number of true negatives. • DANGER! If we don’t take into considerations societal factors, we may end up harming vulnerable populations. Ignore number of false positives, just increase this.
  17. Calibration • In case of risk assessment (recidivism, child protective

    services), we use a scoring function s(x) to estimate the true risk to the individual. • We define some threshold t to make a decision when s(x) > t. • Example: Child Protective Services (CPS) assigns a risk score (1-20) to child. CPS intervenes if the perceived risk to the child is high enough.
  18. Statistical Calibration • Heuristic: Two individuals with the same risk

    score s have the same likelihood of receiving the outcome. • A risk score of 10 should mean the same thing for a white individual as it does for a black individual.
  19. Debate about Northpointe’s COMPAS • COMPAS is used to assign

    a recidivism risk score to prisoners. • ProPublica Claim: Black defendants have higher false positive rates. • Northpointe Defense: Risk scores are well-calibrated by groups.
  20. Datasheets for Data Sets • Taking inspiration from safety standards

    in other industries, such as automobile testing and clinical drug trials, Gebru et. al (2017) propose standards for documenting datasets. • Documentation questions include: ◦ How was the data collection? What time frame? ◦ Why was the dataset created? Who funded its creation? ◦ Does the data contain any sensitive information? ◦ How was the dataset pre-processed/cleaned? ◦ If data relates to people, were they informed about the intended use of the data? • What makes for a good dataset?
  21. Model Cards for Model Reporting • Google researchers propose a

    standard for documenting deployed models. • Sections include: ◦ Intended Use ◦ Factors (evaluation amongst demographic groups) ◦ Ethical Concerns ◦ Caveats and Recommendations. • More transparent model reporting will allows users to better understand when they should (or should not) use your model. Mitchell et. al (2019)
  22. Deon: Ethical Checklist for Data Science • Deon (by DrivenData)

    is a ethics checklist for data projects. ◦ Data Collection ◦ Data Storage ◦ Analysis ◦ Modeling ◦ Deployment • CLI tool creates Markdown file in your repo with this checklist.
  23. AI Now Institute • New York University research institute that

    focuses on understanding the societal and cultural impact of AI and machine learning. • Hosts an annual symposium on Ethics, Organizing, and Accountability. • Recently produced report on diversity crisis in AI and how it affects the development of technical systems.
  24. Papers Referenced 1. The Measures and Mismeasures of Fairness: A

    Critical Review of Fair Machine Learning; https://5harad.com/papers/fair-ml.pdf 2. The Misgendering Machine: Trans/HCI Implications of Automatic Gender Recognition; https://ironholds.org/resources/papers/agr_paper.pdf 3. Delayed Impact of Fair Machine Learning; https://arxiv.org/pdf/1803.04383.pdf 4. Data Sheets for Datasets; https://arxiv.org/pdf/1803.09010.pdf 5. Model Cards for Model Reporting; https://arxiv.org/pdf/1810.03993.pdf 6. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification; http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf 7. Fairness and Abstraction in Sociotechnical Systems; https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3265913 8. Discriminating Systems: Gender Race and Power in AI; https://ainowinstitute.org/discriminatingsystems.pdf