Hi there,
Imagine we have Discrete(2)
action space consiting of action_0
and action_1
.
In the global Environment
we dont want to act every env_step
, so asume action_0
is a “skip action”. (we know, when we do not need to learn).
But for any Policy
Environment
will return its state for every step. So there is a lot of data we dont want to learn on in the observation_history
inside the policy.
In the example of computing actions for pretrained agent it is possible to take control over the actions and observations.
Is it possible to customize same part in the train agent code?
To make something like (every 100th step use policy, else “skip action”) inside train loop:
state = env.reset()
while not done
:
current_step: int = env.get_current_step()
action = policy.compute_action(state) if current_step % 100 == 0 else 0
next_state = env.step(action)
observation_history.push(state, next_state) if current_step % 100 == 0 else None
(Sorry for inlines, didnt find a way to tabulate)