Issue with Running Experiments with Custom Gym Environment

Hello dear members of the Ray team,

I am an MSc student, and I am studying AI. I am trying to perform a comparison of the performance metrics of different well known DRL algorithms such as PPO, SAC, A2C etc. in a custom gym environment. I discovered rllib API and it seems to be the perfect API for the work I want to do. However, when I am trying to run an experiment using Tune.run and PPO as the algorithm of choice I am getting the following error:

TuneError: (‘Trials did not complete’, [PPO_CustomEnv-v0_a2286_00000])

This happens using any of the algorithms I want to conduct experiments with.

I also tried registering the environment to rllib using register_env() function and then using PPOTrainer for conducting some experiments, but this did not work as well.

Could you help me find out what the problem might be?
What is the best way to intergrade custom envs to Ray rllib so I can run the experiments flawlessly?

Thank you very much for your help in advance!

Kind regards,

Christos Peridis

What is the stack of the error you got ? If nothing appears, you can find the error in the error.txt file under your ray home, usually $HOME/results on Linux.

1 Like

@Christos_Peridis
I hope it will help You:

import numpy as np
import gym
from gym.spaces import Box, Discrete, Dict
from ray.tune.registry import register_env
from ray import tune


class BanditEnv(gym.Env):
    def __init__(self, probabilities=[0.1, 0.5, 0.6], number_of_draws=50):
        self.probabilities = probabilities
        self.number_of_draws = number_of_draws
        self.max_avail_actions = len(self.probabilities)
        self.action_space = Discrete(self.max_avail_actions)
        self.observation_space = Box(0, 2, (self.max_avail_actions,), ) # using 2 not 1 cause of the noise value

        self.reset()

    def reset(self):
        self.current_draw = 0
        self.done = False
        self.observation = np.ones(self.max_avail_actions)

        return self.observation

    def step(self, action):
        val = np.random.uniform(low=0.0, high=1.0)
        if val <= self.probabilities[action]:
            reward = 1.0
        else:
            reward = 0.0

        info = {}
        self.current_draw += 1
        if self.current_draw == self.number_of_draws:
            self.done = True

        self.observation = np.ones(self.max_avail_actions)+0.01*np.random.randn(self.max_avail_actions)

        # print(self.current_draw, self.observation, self.done, info,f'reward: {reward}, action={action} val={val}')
        return self.observation, reward, self.done, info


if __name__ == "__main__":
    def env_creator(env_config={}):
        return BanditEnv(probabilities=[0.3, 0.8, 0.8], number_of_draws=50)  # return an env instance


    register_env("my_env", env_creator)

    tune.run("PPO",
             # algorithm specific configuration
             config={"env": "my_env",  #
                     "framework": "tf2",
                     "num_gpus": 1,
                     "num_workers": 2,
                     # "model": {"custom_model": "pa_model", },
                     "evaluation_interval": 1,
                     "evaluation_num_episodes": 2
                     },
             local_dir="cartpole_v1",  # directory to save results
             checkpoint_freq=2,  # frequency between checkpoints
             keep_checkpoints_num=6,
             )
2 Likes

Dear Nicolas Carrara and dear Peter Pirog,

Thank you very much both of you for your reply! Dear Nicolas, I am running my code in Windows 10. I will run the code again to reproduce the errors and I will send them as soon as possible.
Dear Peter, I will try implementing my code in principle to the code you have provided me with, and I will let you know the results as soon as possible.

Thank you very much for your help!

Kind regards,

Christos Peridis

1 Like

Hello dear Nicolas Carrara and dear Peter Pirog,

I hope you and all in your family are well and healthy.
I am sending you this email to let you know that with your help I managed to run my experiments with the custom environment.
I made a script based on Peter’s code sample he provided me with. The code produced some errors in the beginning but by finding the error log files,
following Nicolas’s instructions on where they can be found I managed to debug it and make it run.

Thank you very much both of you for the valuable help you have given me!!!

Kind regards,

Christos Peridis

3 Likes