About our recent HUMIES award-winning algorithm for clinical prediction models
Interpreting a glass-box clinical prediction model of resistant hypertension
In July, my co-authors and I were awarded a HUMIES silver award for generating human-competitive results using evolutionary computation. The award was based on our recent paper:
In this work, we adapt a symbolic regression algorithm to the task of building accurate and interpretable clinical prediction models. We show that this algorithm can generate remarkably accurate prediction models of varying forms of hypertension that are much easier to interpret than black-box ML models and expert-derived heuristic solutions. By interpreting the model for treatment-resistant hypertension, we identified a useful factor that doesn’t appear in any existing clinical guidelines: the presence of a a high calcium lab in a patient’s health record.
Below I walk through the background and use case, and discuss the model we generated.
Electronic Health Records (EHRs) are messy
EHR data used to train ML/AI models are messy, sometimes leading to strange behavior1. One aspect of health care that is difficult to gather from these records is medication adherence. EHRs typically record medications being prescribed, and claims data can be used to determine if medications were picked up, but it’s quite hard to know from these data alone whether you’re taking a medication. This becomes important to our story later on.
Need for explainability in Medicine
Some AI models do not need to be explained; evidence of their reliability is enough. But when it comes to many medical applications of AI, the explainability of models is often crucial. This view is shared by the FDA, where regulatory frameworks categorize AI/ML-based as “Software as a Medical Device” when used in clinical decision support. According to these guidelines, ML recommendations must enable a health care provider “… to independently review the basis for such recommendations”.
Screening for Primary Aldosteronism (PA), a.k.a. Conn’s Syndrome
Primary Aldosteronism (PA) is an adrenal gland disorder that causes hypertension in about 1% of US adults. It’s estimated to be undiagnosed in 1-4 million people in the US. PA is straightforward to treat once it is detected, but it is difficult to find. Our objective in this work is to patients who are at risk for PA, and screen them for it.
There are some tell-tale signs of PA. Based on these signs, we defined three increasingly complicated disease definitions:
- Hypertension (HTN), a.k.a. high blood pressure
- Hypertension with unexplained hypokalemia (low potassium)
- Treatment-resistant hypertension: the patient not responding to treatment for hypertension.
Using these definitions, the clinicians on our team performed chart review, a process in which experts review patients for specific criteria. These expert labels were used as ``ground truth” to assess the quality of resultant models.
FEAT: an interpretable machine learning algorithm
To train interpretable models, we adapted a tool known as the feature engineering automation tool (FEAT)2. At a basic level, FEAT uses genetic programming-based symbolic regression to optimize candidate models. However, FEAT also incorporates many upgrades: each program is a logistic regression model; it uses \(\epsilon\)-lexicase selection3 and NSGA-II survival4; it incorporates parameter optimization5, post-run simplification6; it uses specialized splitting operators a-la decision trees2; and it uses special operators for mutation and crossover7.
Interpreting the treatment resistant hypertension model
FEAT produces an archive of candidate models varying in complexity, shown below. We select a final model using performance on validation data (the blue line). As expected, more complex models tend to overfit to training data, and the validation score closely predicts performance on held-out test data (the black line).
The final model of resistant hypertension is shown below. It contains six interpretable factors, 5 of which are thresholded, all feeding into a logistic regression model.
The first five factors were tell-tale signs of treatment-resistant hypertension: many visits on multiple anti-hypertensive medications, a high mean systolic blood pressure (SBP), etc., but one stood out. The model considered having a maxium blood calcium measurement over 10.1 mg/dL to be a risk factor.
The high calcium feature was initially surprising. Calcium does not appear in any clinical standards for treatment resistant hypertension. The threshold learned by the model, 10.1 mg/dL, is elevated, but not very high. Based on a SHAP explanation of the model (see the paper), we could see that it was identifying a small number of patients as positive cases.
A post-hoc analysis of this feature shed some light on the situation. Subjects with treatment-resistant HTN were in fact more likely to have an elevated maximum calcium. Those elevations, in turn, were associated with the number of days prescribed thiazide diuretics and beta-blockers. Those are two types of anti-hypertensive medications. Anti-hypertensive medications, particularly diuretics, can dysregulate calcium homeostasis8. In addition, hyperparathyroidism, which causes elevated blood calcium, is associated with hypertension.
Recall that electronic health records are notoriously messy. It is difficult to be certain that a patient is taking their medication as prescribed. The calcium factor, then, appears to signal that some patients are, in fact, taking their anti-hypertensive prescriptions. We suspect this rule enabled the model to identify a few affected subjects, either on intensive anti-hypertensive regimens or with underlying hyperparathyroidism, who were missed by expert-curated heuristics that only consider medication prescriptions and blood pressure.
Building models with glass-box ML is a promising way of providing trustworthy clinical recommendations. This application also illustrates one utility of including non-causal predictors in ML models: they may complement other predictors to fill knowledge gaps in available data.
La Cava, W., and Moore, J. H. (2019). Learning concise representations for regression by evolving networks of trees. International Conference on Learning Representations (ICLR) ↩ ↩2
Deb, K., Agrawal, S., Pratap, A., & Meyarivan, T. (2000). A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimization: NSGA-II. Parallel Problem Solving from Nature PPSN VI (Vol. 1917, pp. 849–858). Springer Berlin Heidelberg. ↩
Topchy, A., & Punch, W. F. (2001). Faster genetic programming based on local gradient search of numeric leaf values. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), 155–162. ↩
Helmuth, T., McPhee, N. F., Pantridge, E., & Spector, L. (2017). Improving generalization of evolved programs through automatic simplification. GECCO 2017. ↩
La Cava, W., and Moore, J. H. (2019). Semantic variation operators for multidimensional genetic programming. Genetic and Evolutionary Computation Conference (GECCO) ↩
A new perspective on how this social theory relates to fair machine learning.
We consistently observe lexicase selection running times that are much lower than its worst-case bound of \(O(NC)\). Why?