Horizon and No_Done_At_End

etekin · April 28, 2022, 7:36pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

According to the this, Is there a way to set done flag True only when env returns done otherwise if hit the horizon reset env but leave flag as false?

All Best
Engin

arturn · April 29, 2022, 12:08pm

Hi @etekin,

That is tricky. It’s not possible at the time of writing with these configuration values.
I had to do some digging here myself. There is no super-pretty solution to this I think. What I would do:

Implement your own sample collector. Have a look at what ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector does. You can simply copy the SimpleListCollector into your own code and modify it to postprocess batches the way you like.
These batches will likely need a separate “done” flag in the info field which you can set in your environment’s step method when the episode reaches a natural end. So that your sample collector has a proper indicator for which episodes to manually set the done flag to False.
Pass your sample collector to the trainer config: "sample_collector": SimpleListCollector.

Is there a particular reason why you would not want the flag to be True, even if the environment is reset? Or do you just want to see what happens? Just asking to understand the usecase here.

Best

etekin · April 29, 2022, 6:03pm

Thank you for clarification.

So my goal is to be able to reset an env after max_steps but still use VF_PRED to calculate ADVANTAGES not last_r(0 if done flag is true).

Btw looking at the sample collecter done flag is set to following:
False if not_done_at_end or (hit_horizon and soft_horizon) else env_done

And further env is resetted if hit_horizon.

Looking from the code it seems like if i just set hit_horizon it will reset env but leave done flag as it is from env.? But this contradicts with confg doc.

Topic		Replies	Views
[RLlib] Continuing env, horizon and soft_horizon RLlib	1	512	March 18, 2021
`horizon` and `no_done_at_end` in combination with `PolicyClient` resp. `ExternalEnv` RLlib	0	222	June 17, 2021
Max_episode_steps attribute in customized environment RLlib	3	2118	April 14, 2023
When does an environment reset()? RLlib	5	1536	February 7, 2023
Rllib checkpointing environment in Tune RLlib	1	420	June 2, 2022

Horizon and No_Done_At_End

Related topics