Initialise loss from dummy batch method in policy.py

mannyv · May 5, 2021, 3:01pm

Hi @rob65 welcome to the community,

is not used during training only during the initial setup of the policy. It is used to try and automatically determine which items and how many timesteps are required in the sample_batch during rollouts and training. This process uses dummy data that is not obtained from the environment.

When it is actually used during training the entire sample_batch is used. Does this help or are you still seeing an issue?

Topic		Replies	Views
Issue with custom LSTMs RLlib	34	2141	February 26, 2023
[rllib] SampleBatch "state_in_0" dimension shorter than expected RLlib	5	1344	June 4, 2021
Understanding how loss is computed from training RLlib	1	248	July 11, 2022
Understanding state_batches in compute_actions RLlib	7	1144	August 28, 2021
Why does a SampleBatch contain a different number of elements for the hidden states of the RNN than for the obs, actions, advantages...? RLlib	3	291	June 3, 2021

Initialise loss from dummy batch method in policy.py

Related topics