Sample_from and learning

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

In the docs example of sample_from when the value is not deterministic but stochastic, sampling is done using numpy.random.uniform rather than tune.uniform.

Is there a reason for that? I.e., is there some case to be made against using the same sampler inside as outside of it?

I mean, I think there’s a case against using independent numpy.random sampling, rather, because this way the optimizer doesn’t get to learn anything how to sample the variable using sample_from to minimize cost, which, in my case, is not what I need (I’m trying to sample a random integer between 0 and the value sampled for another random integer), and I think this is not some exotic use case, so probably there are other users and use cases which require this functionality.

Is conditional search space what you are looking for?
https://docs.ray.io/en/latest/tune/tutorials/tune-search-spaces.html#how-to-use-custom-and-conditional-search-spaces-in-tune

Yes, that’s exactly what I was asking about. Is it ok to use tune.uniform instead of np.random.uniform within the lambda passed to sample_from?

e.g.:

param_space = {
    'alpha': tune.randint(0, 10)
    'beta': tune.sample_from(lambda spec: tune.randint(0, spec.config.alpha)
}

I mean, I’d like the optimizer to optimze beta as well, not just use random values generated by np.random.

P.S.: Sorry for the belated follow-up.

Hi @bbudescu,

tune.sample_from only works with random/grid search, so the optimizer doesn’t learn anything anyways. Thus, it doesn’t matter if you sample with numpy or not.

If you want to use a “learning” optimizer, then support for constraints or hierarchical search definitions depends on the optimizer you use - Tune is just an interface for these. I don’t know any that support what you’re looking for, though.

Another way to achieve what you want is to specify a regular search space, and enforce the constraints in the trainable. E.g. you can do

def train(config):
    beta = =min(config["alpha"], config["beta"))
    # ...

to adjust beta to be lower or equal to alpha or

def train(config):
    if config["beta"] > config["alpha"]:
        raise RuntimeError("Abort")
    # ...

to abort the trial if the constraint is not met. Please know that in the first case your optimizer could learn a suboptimal model as it doesn’t know about you updating the parameter.

1 Like

Hi @kai,

Thanks for the reply. This is actually quite valuable, as I haven’t before found anywhere in the docs a way to signal to ray[tune]that a particular parameter configuration is invalid.

Btw, can you maybe point out to me where this is mentioned so that I better understand what, why and how I have missed?

Or, maybe, if it’s not mentioned anywhere, might I suggest to add this to the docs somewhere?

Also, as a follow-up question, how does ray handle this situation? Does it signal the optimizer in any way that a particular configuration is invalid?

And, if it does, I guess it’s up to the optimization backend how it uses that information, right?

Also, does ray have any common “language” that can be used to specify configuration constraints or hierarchical spaces that can be converted for any of the specific backends that might support this kind of search space features?

Hi @bbudescu,

It’s not an “official” way in that sense - concretely it will mean the trial shows up as “ERROR” and won’t be evaluated. Another option would be to return a large loss immediately and stop processing afterwards.

Ray Tune will inform the searcher that the trial completed with an error. And yes, it’s up to the backend how to use that information. For instance, Optuna will consider the state as “OptunaTrialState.FAIL” - the existence suggests that it can be taken into account.

We don’t support explicit constraints, as it’s a relatively niche feature (it doesn’t come up very often). That’s why we leave it to the users to implement it in these cases. It would be great to hear how this situation comes up though. If this is requested more often, we can absolutely look into it!

Hi @kai,

Thanks again for the latest reply.

We don’t support explicit constraints, as it’s a relatively niche feature (it doesn’t come up very often). That’s why we leave it to the users to implement it in these cases. It would be great to hear how this situation comes up though. If this is requested more often, we can absolutely look into it!

Well, constraints can be regarded as a way to implement conditional search, which, in turn, I think has obvious use cases for common tasks like picking a classifier from a bunch of alternatives (let’s say, one would consider using one from a list like SVM, logistic regression, neural net, random forest etc.), each being parametrized by different sets of parameters (perhaps having partial overlap).

For example, I assume that the most frequent use case for the ray framework is training neural nets. One may want to try, e.g., different convex optimization methods for the same task, but most of the parameters of SGD are different than the ones for NAdam (learning rate is used by both, but beta1 and beta2 only make sense to Nadam). Ideally, one would like the searcher to explore both options by trying out various configurations of the respective parameter sets.

I guess this takes the problem closer to the automl scenario, but not necessarily falls in that category. I mean, I imagine one would like the searcher be able to consider neural architectures with varying numbers of layers, and that’s not necessarily automl, right?

Now, other use cases might imply awkward search spaces. Most literature on black box optimization recognizes this issue and addresses it, at least to a certain extent, because it does come up quite often, and not just in the case of machine learning. A lot of algorithms worth their salt support conditional search spaces. Check out SMAC3 (BOHB), hyperopt (TPE), Optuna (TPE), OpenBox etc. Even most of the ones that don’t support conditional sampling, and even if they support only float variables (like the ones using Gaussian Processes), still offer some support for constraints: HEBO, Spearmint (really old), GPflowOPT, HyperMapper, BoTorch etc. I think that most of the algos supported as backends by Ray itself also offer some support for this stuff.

Now, in all fairness, we can obtain the same result by assigning a very high loss to incompatible configurations, as you suggested, explicitly or implicitly (because that’s what algos usually do when receiving a FAILURE result). However, this approach might pollute the error surface with artificially inserted extreme values, which might end up hurting the search performance (it might take longer to converge to a good result). HyperMapper is the only optimizer that I’m aware of that trains a second model for predicting feasibility as a classification problem. However, conditional search space definition avoids this caveat entirely, in my opinion.

Maybe, to illustrate the problem, let’s go back to the old example with NAdam vs SGD. If, let’s say, SGD happens to be sampled many times with a particular value for beta1, and that consistently yields bad results (extreme cost artificially assigned because beta1 and SGD are incompatible), the optimizer might learn to avoid that particular value for beta1 altogether, even when it would be valid in conjunction with NAdam, and, perhaps, yield great results in that context. The point is that one wants to feed as much info about the priors of the parameters to the optimizer as possible, and conditional sampling is something that I believe is quite powerful for many problems.

Thanks,

Bogdan