Horizon and No_Done_At_End

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

According to the this, Is there a way to set done flag True only when env returns done otherwise if hit the horizon reset env but leave flag as false?

All Best
Engin

Hi @etekin,

That is tricky. It’s not possible at the time of writing with these configuration values.
I had to do some digging here myself. There is no super-pretty solution to this I think. What I would do:

  1. Implement your own sample collector. Have a look at what ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector does. You can simply copy the SimpleListCollector into your own code and modify it to postprocess batches the way you like.
  2. These batches will likely need a separate “done” flag in the info field which you can set in your environment’s step method when the episode reaches a natural end. So that your sample collector has a proper indicator for which episodes to manually set the done flag to False.
  3. Pass your sample collector to the trainer config: "sample_collector": SimpleListCollector.

Is there a particular reason why you would not want the flag to be True, even if the environment is reset? Or do you just want to see what happens? Just asking to understand the usecase here.

Best

Thank you for clarification.

So my goal is to be able to reset an env after max_steps but still use VF_PRED to calculate ADVANTAGES not last_r(0 if done flag is true).

Btw looking at the sample collecter done flag is set to following:
False if not_done_at_end or (hit_horizon and soft_horizon) else env_done

And further env is resetted if hit_horizon.

Looking from the code it seems like if i just set hit_horizon it will reset env but leave done flag as it is from env.? But this contradicts with confg doc.