How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Trying to use ray.tune.Tuner, ray.tune.search.optuna.OptunaSearch, ray.tune.schedulers.ASHAScheduler using Ray 2 to find the best hyper-parameters for a PPO policy that maximizes mean reward while also performing early termination of bad trials.
Code snippet below highlights the current process, but that generates the following error:
ray.tune.error.TuneError: No trial resources are available for launching the actor ray.rllib.evaluation.rollout_worker.RolloutWorker.__init__
. To resolve this, specify the Tune option: resources_per_trial=tune.PlacementGroupFactory([{‘CPU’: 1.0}] + [{‘CPU’: 1.0}] * N)
Tuning resources documentation (A Guide To Parallelism and Resources — Ray 2.0.0 ) provides an example for how to specify resources when passing a trainable to ray.tune.Tuner, but haven’t found documentation for how to do this when passing an objective function used by Optuna or some other hyper-parameter search algorithm during the hyper-parameter tuning optimization process.
def train_policy(self):
# create hyper-parameter search space
search_space = self.create_search_space()
# create search algorithm
algo = OptunaSearch(
metric=self.metric,
mode=self.mode
)
# create scheduler that enables aggressive early stopping of bad trials
scheduler = ASHAScheduler(...)
# create tuner
tuner = tune.Tuner(
# objective function that trains PPO policy using hyper-parameters selected by Optuna
self.objective,
# specify tune configuration
tune_config=tune.TuneConfig(
num_samples=self.num_samples,
search_alg=algo,
scheduler=scheduler
),
# specify run configuration
run_config=air.RunConfig(
stop=dict(training_iteration=self.num_train_iters),
verbose=3
),
# specify hyper-parameter search space
param_space=search_space,
)
# run tuner
result_grid = tuner.fit()
def objective(self, config):
# create PPO trainer
trainer = self.create_ppo_trainer(config)
# iterate
for iter in range(self.num_train_iters):
# train policy
results = trainer.train()
# update tuner
session.report(dict(
episode_reward_mean=results[self.metric]
))
Greatly appreciate any help with this.
Thanks,
Stefan