Imagine we have
Discrete(2) action space consiting of
In the global
Environment we dont want to act every
env_step, so asume
action_0 is a “skip action”. (we know, when we do not need to learn).
But for any
Environment will return its state for every step. So there is a lot of data we dont want to learn on in the
observation_history inside the policy.
In the example of computing actions for pretrained agent it is possible to take control over the actions and observations.
Is it possible to customize same part in the train agent code?
To make something like (every 100th step use policy, else “skip action”) inside train loop:
state = env.reset()
current_step: int = env.get_current_step()
action = policy.compute_action(state) if current_step % 100 == 0 else 0
next_state = env.step(action)
observation_history.push(state, next_state) if current_step % 100 == 0 else None
(Sorry for inlines, didnt find a way to tabulate)