How to have multiple Trainers remotely train simultaneously?

aadharna · March 12, 2021, 8:20pm

This is a use-case at the intersection of Ray and RLlib.

So, when I run something like:

result = tune.run(es.ESTrainer, config=config, stop=stop, checkpoint_at_end=True,
                  local_dir=os.path.join('..', 'rllib_results'))

this takes away the focus into tune.run() and I have to wait until this finishes to continue on with the program.

I did see the tune.run_experiment function, but that will also block until all the experiments finish. That’s acceptable, but I was wondering how I could do something like this:

agents = [Agent1, Agent2, ..., Agentk]
envs    = [Env1, Env2, ..., Envk]
N = 500
for n in range(N): 
    eval_futures = [env.evaluate.remote(agent) for env, agent in zip(envs, agents)]
    opt_futures = [agent.optimize.remote(env, stop={'frames': 100000}) for env, agent in zip(envs, agents)]
    updated_agents = [agent.update.remote(opt_futures) for agent in agents]
    combinatorial_eval_futures = [env.evaluate.remote(agent) for env, agent in product(envs, updated_agents)]
    # this should force the various futures to all resolve here
    combo_eval_res = ray.get(combinatorial_eval_futures)
    for i, e in enumerate(envs):
        best_agent_index = np.argmax(combo_eval_res[e.index])
        agents[i].update_weights(updated_agents[best_agent_index].get_weights())

So, right now if I wanted to call that agent.optimize step, based on what I’ve seen with ray.tune.run_experiments(), that’ll keep focus until they all finish and will return analysis objects and not updated trainer objects. Alternatively, there’s the e.g. Trainer.train() method, but that also seems to keep focus.

Topic		Replies	Views
How to run multiple trainers? RLlib	2	336	August 26, 2022
Best practice for multi-stage training workflow RLlib	3	505	September 6, 2022
Rollout multiple batches in parallel RLlib	2	396	July 21, 2021
Distributed multi-agent training RLlib	0	256	February 20, 2022
Multi-Agent Training with Different Algorithms RLlib	24	3512	October 11, 2022

How to have multiple Trainers remotely train simultaneously?

Related topics