Hi @kai,
Thanks again for the latest reply.
We don’t support explicit constraints, as it’s a relatively niche feature (it doesn’t come up very often). That’s why we leave it to the users to implement it in these cases. It would be great to hear how this situation comes up though. If this is requested more often, we can absolutely look into it!
Well, constraints can be regarded as a way to implement conditional search, which, in turn, I think has obvious use cases for common tasks like picking a classifier from a bunch of alternatives (let’s say, one would consider using one from a list like SVM, logistic regression, neural net, random forest etc.), each being parametrized by different sets of parameters (perhaps having partial overlap).
For example, I assume that the most frequent use case for the ray framework is training neural nets. One may want to try, e.g., different convex optimization methods for the same task, but most of the parameters of SGD are different than the ones for NAdam (learning rate is used by both, but beta1 and beta2 only make sense to Nadam). Ideally, one would like the searcher to explore both options by trying out various configurations of the respective parameter sets.
I guess this takes the problem closer to the automl scenario, but not necessarily falls in that category. I mean, I imagine one would like the searcher be able to consider neural architectures with varying numbers of layers, and that’s not necessarily automl, right?
Now, other use cases might imply awkward search spaces. Most literature on black box optimization recognizes this issue and addresses it, at least to a certain extent, because it does come up quite often, and not just in the case of machine learning. A lot of algorithms worth their salt support conditional search spaces. Check out SMAC3 (BOHB), hyperopt (TPE), Optuna (TPE), OpenBox etc. Even most of the ones that don’t support conditional sampling, and even if they support only float variables (like the ones using Gaussian Processes), still offer some support for constraints: HEBO, Spearmint (really old), GPflowOPT, HyperMapper, BoTorch etc. I think that most of the algos supported as backends by Ray itself also offer some support for this stuff.
Now, in all fairness, we can obtain the same result by assigning a very high loss to incompatible configurations, as you suggested, explicitly or implicitly (because that’s what algos usually do when receiving a FAILURE result). However, this approach might pollute the error surface with artificially inserted extreme values, which might end up hurting the search performance (it might take longer to converge to a good result). HyperMapper is the only optimizer that I’m aware of that trains a second model for predicting feasibility as a classification problem. However, conditional search space definition avoids this caveat entirely, in my opinion.
Maybe, to illustrate the problem, let’s go back to the old example with NAdam vs SGD. If, let’s say, SGD happens to be sampled many times with a particular value for beta1, and that consistently yields bad results (extreme cost artificially assigned because beta1 and SGD are incompatible), the optimizer might learn to avoid that particular value for beta1 altogether, even when it would be valid in conjunction with NAdam, and, perhaps, yield great results in that context. The point is that one wants to feed as much info about the priors of the parameters to the optimizer as possible, and conditional sampling is something that I believe is quite powerful for many problems.
Thanks,
Bogdan