Evaluation vs training speed

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Why does evaluation sampling run faster than training sampling even when I use a policy that has no model and I remove the training code from training_step()?

To further elaborate, I have sampling running on its own process that is separate from the driver which controls training. Even if I had a model and left in the training code I’d expect that the driver process wouldn’t slow down the sampler process. Is there specific setting to keep these processes async?

Hi @bdpooles,

Do you have a reproduction script you can share. You can use the MockEnv if your environment is proprietary.

It is hard to know without seeing any configuration or code but is the default settings for most algorithms the driver is controlling the timing of sample collection. It is a synchronous but parallel process.

Which in rllib means that the driver tells each worker to sample new data. Each worker has its own copy of the environment and samples data in parallel then when it has collected at least rollout_fragment_length samples it returns th to the driver.

The driver collects samples from all the workers, evaluates if it has training_batch_size number of them and if so enters the training loop otherwise it starts another sample collection.

This is the general workflow for actor critic algorithms. It is similar for q learning based algorithms or those that use a replay buffer but with those it can create a training batch after each sample so it usually runs the training step after every sample collection of num_workers * rollout_fragment_length.

There are lots of edge cases depending on the algorithm and config.

There is a setting called sample_async that you may be interested in but the config says the following:

# === Advanced Rollout Settings === # Use a background thread for sampling (slightly off-policy, usually not # advisable to turn on unless your env specifically requires it). "sample_async": False,

I can link some code here shortly.

I should state my use case is rather different than your typical RL algorithm. Im interested in interactive RL which involves human feedback. For instance, one thing Im attempting to test is how the performance is if someone plays a game while running an interactive RL algorithm in the background. Currently, I have just been testing how well the performance is when someone is playing a game without any torch or TF model (hence me removing the training code from the training_step() method of my Algorithm).

Thus, Ive been running into this issue where something in the training code slows down the game (from 60 to 20 FPS) while if I choose to run it in evaluation mode there is no slowdown (constant 60 FPS). Now this drop in FPS seems to occur whenever the training_step() method is called but I can’t believe it is the back-propagation because I have removed the code the does any training. Clearly I am missing something, so I’m still trying to debug what the slowdown is exactly.

To add a little more details, since the human will be controlling the game in my tests I have limited the config to use 1 worker for sampling, hoping that the driver would not slowdown the worker (as stated before, this is not the case so far). I have also tried running both the sampling and driver on the same process with the exact same slowdown.

After some additional testing, using sample_async = True seems to work and execute closer to what i was expecting. Seems to work regardless of num_workers being equal to 0 or 1. Do you know why it is recommended as “not advisable” ?

Actually it seems the slowdown is being caused by log generation? I increased the timesteps_per_iteration parameter to 100 and now the slowdown seems tied to whatever this parameter is controlling, Im guessing log generation based on some small conversations I’ve seen around it.

So if I set timesteps_per_iteration = 100 then the slowdown (drop in FPS) occurs every 100 frames/steps. Note that Im am also setting train_batch_size = 1, rollout_fragment_length = 1, and sample_async = False.