When num_workers=n , Why the total number of environments is n+1

robot-xyh · November 10, 2021, 2:36pm

I use a custom environment and set num_workers=5, and then I will print the id of each worker. I found that there are 5 (id numbers from 1 to 5) + 1 (id 0) = 6 environments at this time. I found that when the environment starts to run, the 1st to 5th environments are sampled in parallel, and the 0th environment is sampled separately after the first five samples are over. It seems that the 0th environment needs to sample up to the same amount of data for the first 5, and then it will automatically switch to parallel sampling of the 5 environments, which slows down my sampling speed.

I found

    # When `num_workers` > 0, the driver (local_worker; worker-idx=0) does not
    # need an environment. This is because it doesn't have to sample (done by
    # remote_workers; worker_indices > 0) nor evaluate (done by evaluation
    # workers; see below).
    "create_env_on_driver": False,

but this setting does not work（ray=1.8.0,windows）
Can you help me see where I set it incorrectly, thank you very much！

ray.init(num_cpus=6, num_gpus=1)
tune.register_env(
	"myenv",
	lambda config: Myenv(            
		config,
	))
config = {
	"env": "myenv",
	"env_config": {
		"episode_horizon": 1000,
	},

	"num_workers": 5,
	"create_env_on_driver": False,
	"lambda": 0.98,
	"gamma": 0.99,
	"sgd_minibatch_size": 256,
	"train_batch_size": 1024,
	"num_gpus" :1,
	"num_sgd_iter": 50,
	"rollout_fragment_length": 64,
	"clip_param": 0.25,
    "multiagent": {
            "policies": policies,
            "policy_mapping_fn": policy_mapping_fn,
        },

	
	#"callbacks": MyCallbacks,
	"evaluation_num_workers": 0,
	"evaluation_interval": 5,
	"evaluation_num_episodes": 10,

	"framework": "torch" ,
	"no_done_at_end": False,
}
trials= tune.run(
	"PPO",
	config=config,
	stop=stop,
	verbose=3,
	checkpoint_freq=10,
	checkpoint_at_end=True,
	restore=args.from_checkpoint)

Roller44 · November 10, 2021, 3:12pm

I cannot provide the answer since I am pretty new to RLlib.
I am curious about how you find the sampling mechanism out. Can you tell me?

mannyv · November 10, 2021, 3:36pm

Hi @robot-xyh,

The sith environment is the evaluation environment.

This is being created on the driver because of this combination of settings.

"evaluation_num_workers": 0,
"evaluation_interval": 5

robot-xyh · November 12, 2021, 2:02am

Very useful help. After I turned off the evaluation, the sampling speed increased by 10 times.
At the same time, I found that I added "evaluation_num_workers": 3, the evaluation is not run in parallel. I also noticed

    # === Evaluation Settings ===
    # Evaluate with every `evaluation_interval` training iterations.
    # The evaluation stats will be reported under the "evaluation" metric key.
    # Note that evaluation is currently not parallelized, and that for Ape-X
    # metrics are already only reported for the lowest epsilon workers.

Perhaps this is the reason.
Thank you for helping me solve the problem

robot-xyh · November 12, 2021, 2:06am

The most intuitive method is to observe the actual PID of the console during training. You can also add a command to print the environment ID in the environment.

mannyv · November 12, 2021, 2:40am

@robot-xyh,
There is an experimental setting you could try.

github.com

ray-project/ray/blob/6f85af435f0ef7ff465fa14465e550acba93423a/rllib/agents/trainer.py#L284-L290

    
      
          # Whether to run evaluation in parallel to a Trainer.train() call
          # using threading. Default=False.
          # E.g. evaluation_interval=2 -> For every other training iteration,
          # the Trainer.train() and Trainer.evaluate() calls run in parallel.
          # Note: This is experimental. Possible pitfalls could be race conditions
          # for weight synching at the beginning of the evaluation loop.
          "evaluation_parallel_to_training": False,

hridayns · December 21, 2023, 7:13pm

Hello @mannyv, sorry to revive this topic but I really need help. I am having trouble with the concurrency and multithreading aspects of RLlib when I try to use it with SUMO, so I wanted to make sure that there is only one worker (or only one driver) executing everything instead of splitting the work. I only need one local driver and one environment. What would I have to set the options to? Please advise.

Thanks again

Topic		Replies	Views
[RLlib] Questions about loading checkpoint and asynchrone evaluation workers RLlib	3	598	May 26, 2021
Use a remote worker for Evaluation RLlib	5	549	July 5, 2021
Required resources should be shared between train and eval workers RLlib	5	526	March 31, 2021
Environments with VectorEnv not able to run in parallel RLlib	10	875	June 7, 2022
Different Environment for training and evaluation RLlib	5	1234	July 13, 2021

When num_workers=n , Why the total number of environments is n+1

Related topics