Use of custom OpenAI Gym environments from the RoboHive API

Hello dear Ray Team,

I have being using the Ray API for conducting RL experiments on different RL benchmarks like the CT-graph benchmark Recently I have been tasks to work with Reinforcement Learning in Robotic scenarios. For these settings I have found out the RoboHIve API which has a plethora of different robotic tasks. All the RoboHive environments are OpenAI Gym environments. The API uses OpenAI Gym version 0.13.0 for making the environments. I wanted to use Ray Tune and RLlib for running experiments with their pick and place environment. I have made a notebook where I register the custom RoboHive environment to the Ray API using the env_register function. However when running the method with either PPO, SAC or A2C I receive the following error message:

2023-07-15 20:01:46,845 ERROR – Trial A2C_RoboHive_Pick_Place_0_095a9_00000: Error processing event.
Traceback (most recent call last):
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/tune/”, line 739, in _process_trial
results = self.trial_executor.fetch_result(trial)
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/tune/”, line 746, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/_private/”, line 82, in wrapper
return func(*args, **kwargs)
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/”, line 1623, in get
raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::A2C.init() (pid=14782, ip=
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/agents/”, line 136, in init
Trainer.init(self, config, env, logger_creator)
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/agents/”, line 592, in init
super().init(config, logger_creator)
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/tune/”, line 103, in init
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/agents/”, line 146, in setup
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/agents/”, line 739, in setup
self._init(self.config, self.env_creator)
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/agents/”, line 170, in _init
self.workers = self._make_workers(
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/agents/”, line 821, in _make_workers
return WorkerSet(
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/evaluation/”, line 103, in init
self._local_worker = self._make_worker(
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/evaluation/”, line 399, in _make_worker
worker = cls(
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/evaluation/”, line 580, in init
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/evaluation/”, line 1375, in build_policy_map
self.policy_map.create_policy(name, orig_cls, obs_space, act_space,
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/policy/”, line 136, in create_policy
self[policy_id] = class
(observation_space, action_space,
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/policy/”, line 279, in init
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/policy/”, line 746, in _initialize_loss_from_dummy_batch
self._dummy_batch = self._get_dummy_batch_from_view_requirements(
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/policy/”, line 875, in _get_dummy_batch_from_view_requirements
ret[view_col] = np.zeros_like([
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/policy/”, line 876, in for _ in range(batch_size)
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/gym/spaces/”, line 42, in sample
return self.np_random.uniform(low=self.low, high=high, size=self.shape).astype(self.dtype)
File “mtrand.pyx”, line 1155, in numpy.random.mtrand.RandomState.uniform
OverflowError: Range exceeds valid bounds

The action space of the environment is of type Box(9,) [array dtype=float32] and the observation space of type Box(63,) [numpy.ndarray of numpy.float64].

For compatibility with the RoboHive API dependencies I am using Ray version 1.6.0. The numpy version in my system is 1.25.1. I am running in a conda environment in Ubuntu 20.04.06 LTS through WSL2 with GPU acceleration setup on Windows11.

What is the cause of the issue and How could I resolve it?

Thank you very much in advance for your valuable help and support!!!

I am always at your disposal for any further queries or clarifications regarding my code and my setup.

Kind regards,

Christos Peridis