Increase wait time for trainer when using PolicyServer data input

AJ_Langley · May 17, 2024, 9:50pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hello,

I am trying to adapt the example here to my own environment, which is a Unity game (not using MLAgents, but I don’t think it’s relevant to my issue). My issue is that, between starting up the game and PolicyClient, it can take a fairly long time for the PolicyServers to start getting samples.

I get the following warning:

WARNING rollout_ops.py:112 -- No samples returned from remote workers. If you have a slow environment or model, consider increasing the `sample_timeout_s` or decreasing the `rollout_fragment_length` in `AlgorithmConfig.env_runners().

followed by an error like

\ray\rllib\policy\sample_batch.py", line 950, in __getitem__
    value = dict.__getitem__(self, key)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'obs'

I have tried setting idle_timeout in the PolicyServerInput to a large number (10e8), and have also tried setting the sample_timeout in AlgorithmConfig.env_runners to (10e6), with the same results.

The only thing that works is to decrease rollout_fragment_length to be pretty small (32). Is there any drawback to this other than network overhead? My main concern is about whether the rollout_fragment_length is related to the horizon when computing returns for PPO + GAE. If these two things are not related at all, is there any reason not to make rollout_fragment_length as small as possible, other than that there will be more requests to the PolicyServers?

Topic		Replies	Views
WARNING with 'sample_timeout_s' and rollout_fragment_length Configure Algorithm, Training, Evaluation, Scaling	1	53	April 23, 2025
Client server timeout RLlib	3	416	June 30, 2022
Alert `Policy_Client`s when `Policy_Server` completes an epoch RLlib	2	224	March 30, 2023
RLlib's PolicyServer and external simulator as client RLlib	15	1734	April 12, 2021
Improve rollout with GPUs RLlib	1	350	March 30, 2021

Increase wait time for trainer when using PolicyServer data input

Related topics