I use a custom environment and set num_workers=5, and then I will print the id of each worker. I found that there are 5 (id numbers from 1 to 5) + 1 (id 0) = 6 environments at this time. I found that when the environment starts to run, the 1st to 5th environments are sampled in parallel, and the 0th environment is sampled separately after the first five samples are over. It seems that the 0th environment needs to sample up to the same amount of data for the first 5, and then it will automatically switch to parallel sampling of the 5 environments, which slows down my sampling speed.
I found
# When `num_workers` > 0, the driver (local_worker; worker-idx=0) does not
# need an environment. This is because it doesn't have to sample (done by
# remote_workers; worker_indices > 0) nor evaluate (done by evaluation
# workers; see below).
"create_env_on_driver": False,
but this setting does not work(ray=1.8.0,windows)
Can you help me see where I set it incorrectly, thank you very much!
ray.init(num_cpus=6, num_gpus=1)
tune.register_env(
"myenv",
lambda config: Myenv(
config,
))
config = {
"env": "myenv",
"env_config": {
"episode_horizon": 1000,
},
"num_workers": 5,
"create_env_on_driver": False,
"lambda": 0.98,
"gamma": 0.99,
"sgd_minibatch_size": 256,
"train_batch_size": 1024,
"num_gpus" :1,
"num_sgd_iter": 50,
"rollout_fragment_length": 64,
"clip_param": 0.25,
"multiagent": {
"policies": policies,
"policy_mapping_fn": policy_mapping_fn,
},
#"callbacks": MyCallbacks,
"evaluation_num_workers": 0,
"evaluation_interval": 5,
"evaluation_num_episodes": 10,
"framework": "torch" ,
"no_done_at_end": False,
}
trials= tune.run(
"PPO",
config=config,
stop=stop,
verbose=3,
checkpoint_freq=10,
checkpoint_at_end=True,
restore=args.from_checkpoint)