States of Recurrent models for multiple workers/envs

alonit · April 12, 2021, 7:28pm

Hi everyone,
I would like to clarify whether the state argument that is passed in forward() function of recurrent models doesn’t mix observations from different envs. I’ll explain my question in details:
I have a custom RNN model, custom env and PPOTrainer. I call ray.tune with 1 worker, 10 envs per worker (rollout_fragment_length=300 and train_batch_size=1500). In addition, because the model is heavy and the env performs large amount of steps per episode, I set remote_worker_envs to True and remote_env_batch_wait_ms to 50.
Now I raised the following concern in my mind: In each forward call, I concat the current observation with the last N observations that are passed in the “state” argument of forward(), exactly like GTrXLNet model in attention_net.py. I would like to make sure that the concatenated observations from state arg belong to the same env that input_dict belongs to (in the time axis of course. I don’t mind to have obs from different envs in the batch axis), otherwise there will be a big mess if observations from different environments will be mixed to a single tensor. Does rllib guarantee that property? should I take care of it myself?

Thanks

sven1977 · April 14, 2021, 9:40am

Great question @alonit Yes, RLlib should guarantee this! The reason for this is that we keep each episode completely separate from each other (e.g. carrying the last internal-states out) in our Sampler/SampleCollector classes.

e.g. see ray/rllib/evaluation/sampler.py::_env_runner and _process_observations.

Topic		Replies	Views
Problem with handling states in RNN RLlib	2	730	February 27, 2023
Trying to understand model and env concepts RLlib	7	486	September 29, 2021
How to pass information from agent to env RLlib	2	292	October 13, 2021
Custom LSTM Model, how to define the SEQ_LEN RLlib	5	2465	June 10, 2024
Multi-Path Custom Networks RLlib	2	289	July 25, 2023

States of Recurrent models for multiple workers/envs

Related topics