Different step alignment for agent and environment

redlight · February 24, 2021, 2:20pm

Hi there,

Imagine we have Discrete(2) action space consiting of action_0 and action_1.
In the global Environment we dont want to act every env_step, so asume action_0 is a “skip action”. (we know, when we do not need to learn).
But for any Policy Environment will return its state for every step. So there is a lot of data we dont want to learn on in the observation_history inside the policy.

In the example of computing actions for pretrained agent it is possible to take control over the actions and observations.
Is it possible to customize same part in the train agent code?
To make something like (every 100th step use policy, else “skip action”) inside train loop:

state = env.reset()

while not done:

current_step: int = env.get_current_step()
action = policy.compute_action(state) if current_step % 100 == 0 else 0
next_state = env.step(action)
observation_history.push(state, next_state) if current_step % 100 == 0 else None

(Sorry for inlines, didnt find a way to tabulate)

Lars_Simon_Zehnder · July 29, 2021, 5:47pm

Hi @redlight,

it’s a time now, but maybe this post here helps.

Topic		Replies	Views
Different step space for different agents RLlib	7	839	August 11, 2021
Interaction between env and policy in multi agent environment RLlib	4	382	November 27, 2021
Jump-Start Reinforcement Learning RLlib	33	257	February 12, 2025
Setting global info state in Multi-Agent step function RLlib	0	224	December 9, 2020
Skipping some actions RLlib	2	320	May 9, 2022

Different step alignment for agent and environment

Related topics