THanks, team!
I notice, sometime there are only one experiment running, sometimes there are 2 experiments running at the same time, may I know what arguments can effect decide this?
THanks, team!
I notice, sometime there are only one experiment running, sometimes there are 2 experiments running at the same time, may I know what arguments can effect decide this?
Did you use Tune? What is your resource allocation settings?
Hi Roller44, THanks for your reply!! My script looks like this
config["num_gpus"] = 4
config["num_workers"] = 30
config["num_envs_per_worker"] =4
config["rollout_fragment_length"] =100
...skip a few other configs...
ray.init(num_cpus=40)
analysis = tune.run(
ppo.PPOTrainer,
config=config,
local_dir=log_dir_play_ray,
stop=stop,
checkpoint_at_end=True,
name=exp_name,
)
ray.shutdown()
Your configuration requires 30 * 4 = 120 CPUs (i.e., each of 30 workers requires 4 CPUs to run 4 environments, where, normally, each environment requires 1 CPU) to run all experiments in parallel but you only have 40 CPUs. So you should decrease num_workers
or num_env_per_worker
.
The bottom line is, the number of experiments running in parallel is equal to:
num_cpu_per_worker
x num_workers
x num_envs_per_worker
+ num_cpus_for_driver
),num_gpu_per_worker
x num_workers
+ num_gpus
).great! Thanks! I will have a try