Relaxing the definition of equivalent mathematical expressions to get simpler and more interpretable models
Simplifying symbolic regression models without simplification rules
Symbolic regression (SR) is a non-parametric technique for finding models that present a good balance between predictive power and model simplicity, and it is usually done with evolutionary algorithms1. Symbolic regression creates models by using trees to compose a mathematical expression, like so:
SR is suitable when the user wants the final model to be interpretable, but these models don’t always come in their simplest form; there are problems such as overparameterization of models2, introns (non-coding parts)3, and semantic redundancies4 in the model. Although it is simple for humans to catch many simple cases of these problems at a glance, it is challenging for the computer to simplify these models automatically.
Our recent work5 explores a simplification approach based on memoization. By mapping similar expressions to the same hash, we create a table that stores similar expressions in increasing order of size, allowing us to quickly identify simpler and approximately equivalent substitute symbols.
In this post, I will discuss the main idea and then analyze its effects in a case study, focusing on the benefits of memorization as a strategy to learn to simplify.
Background
Many SR algorithms have a simplification step, but a non-linear problem has infinite solutions, and simplifying these models is difficult. Simplifying a model can be as complicated as finding it in the first instance.
Some previous works have formalized different strategies to simplify models. Still, these strategies are not perfect, either because they are simple enough to run fast or because they are so complex that they increase the algorithm’s execution time (or they are so complicated that it is better not to implement them!).
Approach
Instead of defining simplification rules, we can learn them during the algorithm’s execution. This is done by iterating over parts of the models and learning which ones are similar enough to be considered equivalent during runtime.
If we create a map associating these model parts with different keys, we can map similar parts to the same key. If each key saves an ordered list of similar expressions, we can always replace the occurrence of any part with the smallest of the same key. This is where the idea of using LSH comes in, an approximate way of mapping similar objects to the same key.
Note that this depends on having good diversity in the first generations and that the number of individuals and generations is large enough to populate the table with the occurrences. It’s the price you pay for not implementing a single rule!
Locality-Sensitive Hashing (LSH)
Locality-sensitive hashing (LSH) techniques were first introduced by Gionis et al.6, and aim to map similar structures to the same hash without worrying too much about the accuracy of the similarity calculation: it is a trade-off between having a faster execution by relaxing the definition of equality of two large structures. This is particularly interesting when we want to work with a very large amount of data, such as when saving several parts of mathematical expressions.
Inexact simplification
Simply put, the inexact simplification algorithm works by mapping similar expressions to the same hash, and retrieving the simplest one when doing the simplification of an expression5.
The structure being mapped in our case is the prediction vector of a sub-tree. As we traverse the expression, we calculate its prediction vector for each sub-tree and use it in LSH to obtain a hash. The obtained hash is used in a key-value dictionary, where the value is a list ordered by size — so that there is at least one equivalent expression for each key added to it. When the list is not empty, the first element is used to replace the originating tree. If the list is empty, the tree is inserted as the simplest for that specific hash.
Although it is not necessary to store all occurrences of the same hash in a list, this brings some benefits, such as the possibility of identifying isomorphic trees that were created during execution, doing an analysis of the diversity and redundancy that exists in the data, find insights into data-based simplifications (e.g., a simplification that only makes sense when evaluating the data).
The inexact part comes from the fact that LSH has an stochastic way of mapping similar inputs to the same key, and we also tried to force some collisions (every near-constant prediction vector is mapped to the same hash by multiplying all values by zero if the variance of predictions is smaller than a very low value).
We allow the LSH to store an expression in the hashmap if the difference between the vectors is less than 0.01 (this is what we call tolerance threshold, and it was intended to give the models a small tolerance to numeric rounding, truncating, and calculation differences).
Another example
As a proof of concept, we ran the algorithm for a database with and without simplification and observed the following results.
The simplification led the SR algorithm to find a smaller expression with a better goodness-of-fit on the test data than without simplifying. With only 9 nodes, the simplified tree does not contain any redundancy (such as taking the square root of the square, as the other tree does). It also has one less parameter to adjust, and both models share the same features, which means they rely on the same information to make predictions. Cool!
Takeaways
It is possible to eliminate the task of defining ways to simplify expressions at the cost of relaxing a little on how rigorous we are when making simplifications. This technique is easy to implement and can be pretty convenient for implementations where i) defining simplifications is a tedious and laborious task or ii) it is not known for sure what possible simplifications we can make.
References
-
William La Cava, Patryk Orzechowski, Bogdan Burlacu, Fabricio Franca, Marco Virgolin, Ying Jin, Michael Kommenda, Jason Moore (2021). Contemporary Symbolic Regression Methods and their Relative Performance. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks ↩
-
Fabricio Olivetti de Franca and Gabriel Kronberger. 2023. Reducing Overparameterization of Symbolic Regression Models with Equality Saturation. In Proceedings of the Genetic and Evolutionary Computation Conference (Lisbon, Portugal) (GECCO ’23). Association for Computing Machinery, New York, NY, USA, 1064–1072. https://doi.org/10.1145/3583131.3590346 ↩
-
Michael Affenzeller, Stephan M Winkler, Gabriel Kronberger, Michael Kommenda, Bogdan Burlacu, and Stefan Wagner. 2014. Gaining deeper insights in symbolic regression. Genetic Programming Theory and Practice XI (2014), 175–190. ↩
-
Sean Luke and Liviu Panait. 2006. A Comparison of Bloat Control Methods for Genetic Programming. Evolutionary Computation 14, 3 (Sept. 2006), 309–344. https://doi.org/10.1162/evco.2006.14.3.309 ↩
-
Guilherme Seidyo Imai Aldeia, Fabricio Olivetti Franca, William G. La Cava (2024). Inexact Simplification of Symbolic Regression Expressions with Locality-sensitive Hashing. Proceedings of the Genetic and Evolutionary Computation Conference ↩ ↩2
-
Aristides Gionis, Piotr Indyk, and Rajeev Motwani. [n. d.]. Similarity Search in High Dimensions via Hashing. ([n. d.]). ↩
2024
2023
About our recent HUMIES award-winning algorithm for clinical prediction models
A new perspective on how this social theory relates to fair machine learning.
2022
We consistently observe lexicase selection running times that are much lower than its worst-case bound of \(O(NC)\). Why?