How severe does this issue affect your experience of using Ray?
- None: Just asking a question out of curiosity
Can I define initial action at t = 0(obs = env.reset())?
- I’m using “use_prev_action” and “use_prev_reward” in model config.
- I want to know how prev_action and prev_reward defined at first action inference
(whether is it random or something)
I think the fastest solution is manually including init action and init reward in my custom env’s observation.