How to have multiple Trainers remotely train simultaneously?

This is a use-case at the intersection of Ray and RLlib.

So, when I run something like:

result = tune.run(es.ESTrainer, config=config, stop=stop, checkpoint_at_end=True,
                  local_dir=os.path.join('..', 'rllib_results'))

this takes away the focus into tune.run() and I have to wait until this finishes to continue on with the program.

I did see the tune.run_experiment function, but that will also block until all the experiments finish. That’s acceptable, but I was wondering how I could do something like this:

agents = [Agent1, Agent2, ..., Agentk]
envs    = [Env1, Env2, ..., Envk]
N = 500
for n in range(N): 
    eval_futures = [env.evaluate.remote(agent) for env, agent in zip(envs, agents)]
    opt_futures = [agent.optimize.remote(env, stop={'frames': 100000}) for env, agent in zip(envs, agents)]
    updated_agents = [agent.update.remote(opt_futures) for agent in agents]
    combinatorial_eval_futures = [env.evaluate.remote(agent) for env, agent in product(envs, updated_agents)]
    # this should force the various futures to all resolve here
    combo_eval_res = ray.get(combinatorial_eval_futures)
    for i, e in enumerate(envs):
        best_agent_index = np.argmax(combo_eval_res[e.index])
        agents[i].update_weights(updated_agents[best_agent_index].get_weights())

So, right now if I wanted to call that agent.optimize step, based on what I’ve seen with ray.tune.run_experiments(), that’ll keep focus until they all finish and will return analysis objects and not updated trainer objects. Alternatively, there’s the e.g. Trainer.train() method, but that also seems to keep focus.

1 Like