Fair Machine Learning for Health Care

🎉 La Cava and Lett’s fair ML tool, Interfair, won first place ($250K) in the 2023 NIH Challenge, “Bias Detection Tools for Clinical Decision Making”.

When deployed in healthcare settings, it’s important that models are fair - i.e., that they do not cause harm or unjustly benefit specific subgroups of a population. Improving the fairness of computational models is a complex and nuanced challenge that requires decision makers to carefully reason about multiple, sometimes conflicting criteria. Specific definitions of fairness can vary considerably (e.g. prioritizing equivalent error rates across patient groups vs. similar treatment of similar individuals) and must be contextually appropriate to each application. Inherent conflicts may arise when striving to maximize multiple types of fairness simultaneously (e.g. calibration by group vs. equalized odds1). There are often fundamental trade-offs between the overall error rate of a model and its fairness, and it is important to clearly and intuitively characterize and present these trade-offs to stakeholders in the health system. For example, one might care more about fairly prioritizing patients in patient triage settings, but care more about error rates in predicting individual treatment plans and outcomes. Furthermore, it is computationally challenging to audit and improve model fairness when considering a large set of intersecting patient attributes including gender, race, ethnicity, age, socioeconomic status, among others2; yet, preventing worst-case performance for minoritized groups is often a central ethical prerogative. Thus, it is critical for investigators to consider not only fairness by what measure, but also fairness for whom, and with what tradeoffs to other measures of model performance and fairness.

Providing a set of models by jointly optimizing for fairness and accuracy is one way to aid a decision maker in understanding how an algorithm will affect the people it interacts with when it is deployed. As we describe in a perspective on intersectionality in machine learning, achieving fairness also requires an broader ethical analysis to extend beyond the model development process (data collection, preprocessing, training, deployment) to the wider context of an algorithm’s use as a socio-technical artifact, for example by eliciting community participation in defining project goals and establishing criteria for monitoring downstream outcomes of the model’s use throughout its complete lifecycle.

Related Publications

Effects of Race and Gender Classifications on Atherosclerotic Cardiovascular Disease Risk Estimates for Clinical Decision-Making in a Cohort of Black Transgender Women
Poteat, T., Lett, E., Rich, A. J., Jiang, H., Wirtz, A. L., Radix, A., Reisner, S. L., Harris, A. B., Malone, J., La Cava, W. G., Lesko, C. R., Mayer, K. H., & Streed, C. G. (2023)
Health Equity
Optimizing fairness tradeoffs in machine learning with multiobjective meta-models
La Cava, W. G. (2023)
Genetic and Evolutionary Computation Conference (GECCO)
Fair admission risk prediction with proportional multicalibration
La Cava, W., Lett, E., and Wan, G. (2023)
Conference on Health, Inference, and Learning
Proceedings of Machine Learning Research
Translating intersectionality to fair machine learning in health sciences
Lett, E. and La Cava, W. G. (2023)
Nature Machine Intelligence
Algorithmic Fairness: Mitigating Bias in Healthcare AI
Lett, E. (2022)
Genetic programming approaches to learning fair classifiers
La Cava, W. and Moore, Jason H. (2020)
Genetic and Evolutionary Computation Conference (GECCO)


  1. Kleinberg, J., Mullainathan, S., & Raghavan, M. (2016). Inherent Trade-Offs in the Fair Determination of Risk Scores arXiv:1609.05807 

  2. Kearns, M., Neel, S., Roth, A., & Wu, Z. S. (2018). Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness. Proceedings of the 35th International Conference on Machine Learning, 2564–2572. PMLR