How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hello,
I am trying to adapt the example here to my own environment, which is a Unity game (not using MLAgents, but I don’t think it’s relevant to my issue). My issue is that, between starting up the game and PolicyClient, it can take a fairly long time for the PolicyServers to start getting samples.
I get the following warning:
WARNING rollout_ops.py:112 -- No samples returned from remote workers. If you have a slow environment or model, consider increasing the `sample_timeout_s` or decreasing the `rollout_fragment_length` in `AlgorithmConfig.env_runners().
followed by an error like
\ray\rllib\policy\sample_batch.py", line 950, in __getitem__
value = dict.__getitem__(self, key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'obs'
I have tried setting idle_timeout
in the PolicyServerInput
to a large number (10e8), and have also tried setting the sample_timeout
in AlgorithmConfig.env_runners
to (10e6), with the same results.
The only thing that works is to decrease rollout_fragment_length
to be pretty small (32). Is there any drawback to this other than network overhead? My main concern is about whether the rollout_fragment_length
is related to the horizon when computing returns for PPO + GAE. If these two things are not related at all, is there any reason not to make rollout_fragment_length
as small as possible, other than that there will be more requests to the PolicyServers?