Hello dear members of the Ray team,
I am an MSc student, and I am studying AI. I am trying to perform a comparison of the performance metrics of different well known DRL algorithms such as PPO, SAC, A2C etc. in a custom gym environment. I discovered rllib API and it seems to be the perfect API for the work I want to do. However, when I am trying to run an experiment using Tune.run and PPO as the algorithm of choice I am getting the following error:
TuneError: (‘Trials did not complete’, [PPO_CustomEnv-v0_a2286_00000])
This happens using any of the algorithms I want to conduct experiments with.
I also tried registering the environment to rllib using register_env() function and then using PPOTrainer for conducting some experiments, but this did not work as well.
Could you help me find out what the problem might be?
What is the best way to intergrade custom envs to Ray rllib so I can run the experiments flawlessly?
Thank you very much for your help in advance!
Kind regards,
Christos Peridis
What is the stack of the error you got ? If nothing appears, you can find the error in the error.txt
file under your ray home, usually $HOME/results
on Linux.
1 Like
@Christos_Peridis
I hope it will help You:
import numpy as np
import gym
from gym.spaces import Box, Discrete, Dict
from ray.tune.registry import register_env
from ray import tune
class BanditEnv(gym.Env):
def __init__(self, probabilities=[0.1, 0.5, 0.6], number_of_draws=50):
self.probabilities = probabilities
self.number_of_draws = number_of_draws
self.max_avail_actions = len(self.probabilities)
self.action_space = Discrete(self.max_avail_actions)
self.observation_space = Box(0, 2, (self.max_avail_actions,), ) # using 2 not 1 cause of the noise value
self.reset()
def reset(self):
self.current_draw = 0
self.done = False
self.observation = np.ones(self.max_avail_actions)
return self.observation
def step(self, action):
val = np.random.uniform(low=0.0, high=1.0)
if val <= self.probabilities[action]:
reward = 1.0
else:
reward = 0.0
info = {}
self.current_draw += 1
if self.current_draw == self.number_of_draws:
self.done = True
self.observation = np.ones(self.max_avail_actions)+0.01*np.random.randn(self.max_avail_actions)
# print(self.current_draw, self.observation, self.done, info,f'reward: {reward}, action={action} val={val}')
return self.observation, reward, self.done, info
if __name__ == "__main__":
def env_creator(env_config={}):
return BanditEnv(probabilities=[0.3, 0.8, 0.8], number_of_draws=50) # return an env instance
register_env("my_env", env_creator)
tune.run("PPO",
# algorithm specific configuration
config={"env": "my_env", #
"framework": "tf2",
"num_gpus": 1,
"num_workers": 2,
# "model": {"custom_model": "pa_model", },
"evaluation_interval": 1,
"evaluation_num_episodes": 2
},
local_dir="cartpole_v1", # directory to save results
checkpoint_freq=2, # frequency between checkpoints
keep_checkpoints_num=6,
)
2 Likes
Dear Nicolas Carrara and dear Peter Pirog,
Thank you very much both of you for your reply! Dear Nicolas, I am running my code in Windows 10. I will run the code again to reproduce the errors and I will send them as soon as possible.
Dear Peter, I will try implementing my code in principle to the code you have provided me with, and I will let you know the results as soon as possible.
Thank you very much for your help!
Kind regards,
Christos Peridis
1 Like
Hello dear Nicolas Carrara and dear Peter Pirog,
I hope you and all in your family are well and healthy.
I am sending you this email to let you know that with your help I managed to run my experiments with the custom environment.
I made a script based on Peter’s code sample he provided me with. The code produced some errors in the beginning but by finding the error log files,
following Nicolas’s instructions on where they can be found I managed to debug it and make it run.
Thank you very much both of you for the valuable help you have given me!!!
Kind regards,
Christos Peridis
3 Likes