Ignore step in Environment

SebastianBo1995 · October 4, 2021, 12:07pm

I am coupling RLlib with a simulation software. There is the problem, that the simulation sometimes does not converge. This leads to unplausible values (outputs and therefore wrong correlation between action and reward).

Is there a setting in rllib to ignore this calculation?

Thanks in advance

arturn · October 11, 2021, 10:20am

Hi @SebastianBo1995

RLLib does not know what you mean by unplausible values.
You can encapsulate your environment in your own custom environment class tho and do the work yourself. The following code is taken from the official website and modified a little bit.

import gym, ray
from ray.rllib.agents import ppo

class MyEnv(gym.Env):
    def __init__(self, env_config):
       self.env = SebastiansSimulation()
    def reset(self):
        obs = self.env.reset()
        # <your plausability tests and actions go here>
        return obs
    def step(self, action):
        obs, reward, done, info = self.env.step(action)
        # <your plausability tests and actions go here>
        return obs, reward, done, info

ray.init()
trainer = ppo.PPOTrainer(env=MyEnv, config={
    "env_config": {},  # config to pass to env class
})

Cheers

SebastianBo1995 · November 22, 2021, 6:52pm

@arturn Thanks for the suggestion. However, if the simulation did not converge, I can do a plausible check however, I will always need to return something. This means it is in the buffer and will be used to train the agent.

Is it possible to return nothing while sampling, e.g. with a callback? With the on sample end it is not working properly.

arturn · November 22, 2021, 11:12pm

Hi @SebastianBo1995 ,

Thanks for the clarification! To be honest: I can not tell you for sure if you are allowed to not return anything. But I do not see how the samplers catch an exception that you could use there. Is it your goal to eliminate the episode completely? Or only the experience of the last step?

I suggest that you return that info with your info dict in your env’s step function.
Then, write a callback to postprocess that episode.

Topic		Replies	Views
Num_env & agent_steps_trained 0 even though steps sampled? RLlib	7	862	April 25, 2024
Step function while creating custom environments RLlib	3	433	May 23, 2023
Num_agent_steps_trained: 0 Configure Algorithm, Training, Evaluation, Scaling	2	242	May 4, 2024
Issue with Running Experiments with Custom Gym Environment RLlib	4	505	June 13, 2022
`training_step()` fails with custom environment RLlib	1	296	August 1, 2023

Ignore step in Environment

Related topics