How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
TLDR; 2 questions:
- is there a technical reason that SAC has the rollout_fragment_length set to 1? I’d like to make it bigger than the length of an episode for my environment and I am wondering if changing this default value will present an issue.
- Is there a way to add data to the sample_batch used for a custom_loss without relying on writing a postprocess_fn (which is limited by rollout_fragment_length setting)? if possible this would be preferable to changing the rollout_fragment_length
More detail;
I am looking at computing statistics based on entire episodes to create a loss function for an RL system. In SAC, the rollout_fragment_length
is set to 1 but in the parent classes it has different values (SimpleQ has rollout_fragment_length
= 4 and it looks like DQN inherits this value).
The reason this matters is that in a policy’s postprocess_fn, it takes in a sample_batch which has data with a length equal to rollout_fragment_length. This for instance means that I can’t do what is described here in SAC without changing the rollout_fragment_length
.
Of course, changing rollout_fragment_length
merely so I can add data to the dataset is a bit indirect and will affect other training factors (e.g. if the train batch is 256 and rollout_fragment_length is now set to 512 there is a lot of wasted computation) that are generally more important. If there is an alternative approach to doing a supervised loss on an entire trajectory without having to change rollout_fragment_length that would be better