I am coupling RLlib with a simulation software. There is the problem, that the simulation sometimes does not converge. This leads to unplausible values (outputs and therefore wrong correlation between action and reward).
Is there a setting in rllib to ignore this calculation?
RLLib does not know what you mean by unplausible values.
You can encapsulate your environment in your own custom environment class tho and do the work yourself. The following code is taken from the official website and modified a little bit.
import gym, ray
from ray.rllib.agents import ppo
class MyEnv(gym.Env):
def __init__(self, env_config):
self.env = SebastiansSimulation()
def reset(self):
obs = self.env.reset()
# <your plausability tests and actions go here>
return obs
def step(self, action):
obs, reward, done, info = self.env.step(action)
# <your plausability tests and actions go here>
return obs, reward, done, info
ray.init()
trainer = ppo.PPOTrainer(env=MyEnv, config={
"env_config": {}, # config to pass to env class
})
Cheers
@arturn Thanks for the suggestion. However, if the simulation did not converge, I can do a plausible check however, I will always need to return something. This means it is in the buffer and will be used to train the agent.
Is it possible to return nothing while sampling, e.g. with a callback? With the on sample end it is not working properly.
Thanks for the clarification! To be honest: I can not tell you for sure if you are allowed to not return anything. But I do not see how the samplers catch an exception that you could use there. Is it your goal to eliminate the episode completely? Or only the experience of the last step?