Different step alignment for agent and environment

Hi there,

Imagine we have Discrete(2) action space consiting of action_0 and action_1.
In the global Environment we dont want to act every env_step, so asume action_0 is a “skip action”. (we know, when we do not need to learn).
But for any Policy Environment will return its state for every step. So there is a lot of data we dont want to learn on in the observation_history inside the policy.

In the example of computing actions for pretrained agent it is possible to take control over the actions and observations.
Is it possible to customize same part in the train agent code?
To make something like (every 100th step use policy, else “skip action”) inside train loop:

state = env.reset()

while not done:

current_step: int = env.get_current_step()
action = policy.compute_action(state) if current_step % 100 == 0 else 0
next_state = env.step(action)
observation_history.push(state, next_state) if current_step % 100 == 0 else None

(Sorry for inlines, didnt find a way to tabulate)

Hi @redlight,

it’s a time now, but maybe this post here helps.

2 Likes