How prev_action is defined at env.reset() step?

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity

Can I define initial action at t = 0(obs = env.reset())?

  1. I’m using “use_prev_action” and “use_prev_reward” in model config.
  2. I want to know how prev_action and prev_reward defined at first action inference
    (whether is it random or something)

I think the fastest solution is manually including init action and init reward in my custom env’s observation.

Hi @suro_yoon,

A lot has changes in the library since I last checked but as far as I know they are still set to all zeros.

@mannyv Thanks,
i think i have checked it too several years ago.
but not sure my memories.

That “set to all zeros” doesn’t mean action=0. am I right?
let’s say action space = Discrete(3),
then the tensor after one-hot is [0,0,0]. not [1,0,0]