Ignore step in Environment

I am coupling RLlib with a simulation software. There is the problem, that the simulation sometimes does not converge. This leads to unplausible values (outputs and therefore wrong correlation between action and reward).

Is there a setting in rllib to ignore this calculation?

Thanks in advance

Hi @SebastianBo1995

RLLib does not know what you mean by unplausible values.
You can encapsulate your environment in your own custom environment class tho and do the work yourself. The following code is taken from the official website and modified a little bit.

import gym, ray
from ray.rllib.agents import ppo

class MyEnv(gym.Env):
    def __init__(self, env_config):
       self.env = SebastiansSimulation()
    def reset(self):
        obs = self.env.reset()
        # <your plausability tests and actions go here>
        return obs
    def step(self, action):
        obs, reward, done, info = self.env.step(action)
        # <your plausability tests and actions go here>
        return obs, reward, done, info

ray.init()
trainer = ppo.PPOTrainer(env=MyEnv, config={
    "env_config": {},  # config to pass to env class
})

Cheers
1 Like

@arturn Thanks for the suggestion. However, if the simulation did not converge, I can do a plausible check however, I will always need to return something. This means it is in the buffer and will be used to train the agent.

Is it possible to return nothing while sampling, e.g. with a callback? With the on sample end it is not working properly.

Hi @SebastianBo1995 ,

Thanks for the clarification! To be honest: I can not tell you for sure if you are allowed to not return anything. But I do not see how the samplers catch an exception that you could use there. Is it your goal to eliminate the episode completely? Or only the experience of the last step?

I suggest that you return that info with your info dict in your env’s step function.
Then, write a callback to postprocess that episode.