When num_workers=n , Why the total number of environments is n+1

I use a custom environment and set num_workers=5, and then I will print the id of each worker. I found that there are 5 (id numbers from 1 to 5) + 1 (id 0) = 6 environments at this time. I found that when the environment starts to run, the 1st to 5th environments are sampled in parallel, and the 0th environment is sampled separately after the first five samples are over. It seems that the 0th environment needs to sample up to the same amount of data for the first 5, and then it will automatically switch to parallel sampling of the 5 environments, which slows down my sampling speed.

I found

    # When `num_workers` > 0, the driver (local_worker; worker-idx=0) does not
    # need an environment. This is because it doesn't have to sample (done by
    # remote_workers; worker_indices > 0) nor evaluate (done by evaluation
    # workers; see below).
    "create_env_on_driver": False,

but this setting does not work(ray=1.8.0,windows)
Can you help me see where I set it incorrectly, thank you very much!

ray.init(num_cpus=6, num_gpus=1)
tune.register_env(
	"myenv",
	lambda config: Myenv(            
		config,
	))
config = {
	"env": "myenv",
	"env_config": {
		"episode_horizon": 1000,
	},

	"num_workers": 5,
	"create_env_on_driver": False,
	"lambda": 0.98,
	"gamma": 0.99,
	"sgd_minibatch_size": 256,
	"train_batch_size": 1024,
	"num_gpus" :1,
	"num_sgd_iter": 50,
	"rollout_fragment_length": 64,
	"clip_param": 0.25,
    "multiagent": {
            "policies": policies,
            "policy_mapping_fn": policy_mapping_fn,
        },

	
	#"callbacks": MyCallbacks,
	"evaluation_num_workers": 0,
	"evaluation_interval": 5,
	"evaluation_num_episodes": 10,

	"framework": "torch" ,
	"no_done_at_end": False,
}
trials= tune.run(
	"PPO",
	config=config,
	stop=stop,
	verbose=3,
	checkpoint_freq=10,
	checkpoint_at_end=True,
	restore=args.from_checkpoint)

I cannot provide the answer since I am pretty new to RLlib.
I am curious about how you find the sampling mechanism out. Can you tell me?

Hi @robot-xyh,

The sith environment is the evaluation environment.

This is being created on the driver because of this combination of settings.

"evaluation_num_workers": 0,
"evaluation_interval": 5

Very useful help. After I turned off the evaluation, the sampling speed increased by 10 times.
At the same time, I found that I added "evaluation_num_workers": 3, the evaluation is not run in parallel. I also noticed

    # === Evaluation Settings ===
    # Evaluate with every `evaluation_interval` training iterations.
    # The evaluation stats will be reported under the "evaluation" metric key.
    # Note that evaluation is currently not parallelized, and that for Ape-X
    # metrics are already only reported for the lowest epsilon workers.

Perhaps this is the reason.
Thank you for helping me solve the problem

The most intuitive method is to observe the actual PID of the console during training. You can also add a command to print the environment ID in the environment.

@robot-xyh,
There is an experimental setting you could try.