Use of custom OpenAI Gym environments from the RoboHive API

Hello dear Ray Team,

I have being using the Ray API for conducting RL experiments on different RL benchmarks like the CT-graph benchmark Recently I have been tasks to work with Reinforcement Learning in Robotic scenarios. For these settings I have found out the RoboHIve API which has a plethora of different robotic tasks. All the RoboHive environments are OpenAI Gym environments. The API uses OpenAI Gym version 0.13.0 for making the environments. I wanted to use Ray Tune and RLlib for running experiments with their pick and place environment. I have made a notebook where I register the custom RoboHive environment to the Ray API using the env_register function. However when running the method tune.run with either PPO, SAC or A2C I receive the following error message:

2023-07-15 20:01:46,845 ERROR trial_runner.py:773 – Trial A2C_RoboHive_Pick_Place_0_095a9_00000: Error processing event.
Traceback (most recent call last):
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/tune/trial_runner.py”, line 739, in _process_trial
results = self.trial_executor.fetch_result(trial)
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/tune/ray_trial_executor.py”, line 746, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/_private/client_mode_hook.py”, line 82, in wrapper
return func(*args, **kwargs)
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/worker.py”, line 1623, in get
raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::A2C.init() (pid=14782, ip=172.19.132.220)
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/agents/trainer_template.py”, line 136, in init
Trainer.init(self, config, env, logger_creator)
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/agents/trainer.py”, line 592, in init
super().init(config, logger_creator)
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/tune/trainable.py”, line 103, in init
self.setup(copy.deepcopy(self.config))
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/agents/trainer_template.py”, line 146, in setup
super().setup(config)
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/agents/trainer.py”, line 739, in setup
self._init(self.config, self.env_creator)
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/agents/trainer_template.py”, line 170, in _init
self.workers = self._make_workers(
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/agents/trainer.py”, line 821, in _make_workers
return WorkerSet(
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py”, line 103, in init
self._local_worker = self._make_worker(
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py”, line 399, in _make_worker
worker = cls(
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py”, line 580, in init
self._build_policy_map(
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py”, line 1375, in build_policy_map
self.policy_map.create_policy(name, orig_cls, obs_space, act_space,
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/policy/policy_map.py”, line 136, in create_policy
self[policy_id] = class
(observation_space, action_space,
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/policy/policy_template.py”, line 279, in init
self._initialize_loss_from_dummy_batch(
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/policy/policy.py”, line 746, in _initialize_loss_from_dummy_batch
self._dummy_batch = self._get_dummy_batch_from_view_requirements(
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/policy/policy.py”, line 875, in _get_dummy_batch_from_view_requirements
ret[view_col] = np.zeros_like([
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/ray/rllib/policy/policy.py”, line 876, in
view_req.space.sample() for _ in range(batch_size)
File “/home/cocp5/anaconda3/envs/rhRL3906c/lib/python3.9/site-packages/gym/spaces/box.py”, line 42, in sample
return self.np_random.uniform(low=self.low, high=high, size=self.shape).astype(self.dtype)
File “mtrand.pyx”, line 1155, in numpy.random.mtrand.RandomState.uniform
OverflowError: Range exceeds valid bounds

The action space of the environment is of type Box(9,) [array dtype=float32] and the observation space of type Box(63,) [numpy.ndarray of numpy.float64].

For compatibility with the RoboHive API dependencies I am using Ray version 1.6.0. The numpy version in my system is 1.25.1. I am running in a conda environment in Ubuntu 20.04.06 LTS through WSL2 with GPU acceleration setup on Windows11.

What is the cause of the issue and How could I resolve it?

Thank you very much in advance for your valuable help and support!!!

I am always at your disposal for any further queries or clarifications regarding my code and my setup.

Kind regards,

Christos Peridis