Increase wait time for trainer when using PolicyServer data input

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hello,

I am trying to adapt the example here to my own environment, which is a Unity game (not using MLAgents, but I don’t think it’s relevant to my issue). My issue is that, between starting up the game and PolicyClient, it can take a fairly long time for the PolicyServers to start getting samples.

I get the following warning:

WARNING rollout_ops.py:112 -- No samples returned from remote workers. If you have a slow environment or model, consider increasing the `sample_timeout_s` or decreasing the `rollout_fragment_length` in `AlgorithmConfig.env_runners().

followed by an error like

\ray\rllib\policy\sample_batch.py", line 950, in __getitem__
    value = dict.__getitem__(self, key)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'obs'

I have tried setting idle_timeout in the PolicyServerInput to a large number (10e8), and have also tried setting the sample_timeout in AlgorithmConfig.env_runners to (10e6), with the same results.

The only thing that works is to decrease rollout_fragment_length to be pretty small (32). Is there any drawback to this other than network overhead? My main concern is about whether the rollout_fragment_length is related to the horizon when computing returns for PPO + GAE. If these two things are not related at all, is there any reason not to make rollout_fragment_length as small as possible, other than that there will be more requests to the PolicyServers?