Our research focuses on developing multi-objective learning methods and using them to explain the principles underlying complex, biomedical processes. We use these methods to learn predictive models from electronic health records (EHRs) that are both interpretable to clinicians and fair to the population on which they are deployed. Our long-term goals are to positively impact human health by developing methods that are flexible enough to automate entire computational workflows underlying scientific discovery and medicine.
It is a major challenge to automatically extract useful models from large, messy data collections like EHRs. We would like models to tease apart the factors of variation (i.e. features) that lead to the observed response. In addition, we would like to be able to understand what these features mean, and how they relate to the process. When deployed in healthcare settings, it’s also important that models are fair - i.e. that they do not cause harm or unjustly benefit specific subgroups of a population. Otherwise, models deployed to assist in patient triage, for example, could exacerbate existing unfairness in the health system. Our research focuses on addressing these challenges by developing flexible search methods that can explicitly optimize multiple criteria - in this case, between model accuracy, conciseness and fairness. Our long-term interests are to incorporate the manual tasks of today’s data scientists, including steps like data cleaning, feature engineering, code writing, and result generation, into a robust learning framework. Automating these time-intensive aspects of data science would allow domain experts to go quickly from data collection to knowledge discovery. The biomedical sciences are particularly suited to these advances, since there are many experts in highly specialized disciplines who are eager to shed light on the processes they study by leveraging large data collections.