This is a use-case at the intersection of Ray and RLlib.
So, when I run something like:
result = tune.run(es.ESTrainer, config=config, stop=stop, checkpoint_at_end=True,
local_dir=os.path.join('..', 'rllib_results'))
this takes away the focus into tune.run()
and I have to wait until this finishes to continue on with the program.
I did see the tune.run_experiment
function, but that will also block until all the experiments finish. That’s acceptable, but I was wondering how I could do something like this:
agents = [Agent1, Agent2, ..., Agentk]
envs = [Env1, Env2, ..., Envk]
N = 500
for n in range(N):
eval_futures = [env.evaluate.remote(agent) for env, agent in zip(envs, agents)]
opt_futures = [agent.optimize.remote(env, stop={'frames': 100000}) for env, agent in zip(envs, agents)]
updated_agents = [agent.update.remote(opt_futures) for agent in agents]
combinatorial_eval_futures = [env.evaluate.remote(agent) for env, agent in product(envs, updated_agents)]
# this should force the various futures to all resolve here
combo_eval_res = ray.get(combinatorial_eval_futures)
for i, e in enumerate(envs):
best_agent_index = np.argmax(combo_eval_res[e.index])
agents[i].update_weights(updated_agents[best_agent_index].get_weights())
So, right now if I wanted to call that agent.optimize
step, based on what I’ve seen with ray.tune.run_experiments(), that’ll keep focus until they all finish and will return analysis objects and not updated trainer objects. Alternatively, there’s the e.g. Trainer.train() method, but that also seems to keep focus.