Hello,
i am new to rllib and have a custom gym environment with a custom Observation Space that is essentially a list of integers (Couldve also used MultiDiscrete for this but according to docs MultiDiscrete gets One-hot encoded which I don’t want.).
So my environment’s step function returns obs, r, done, info as is usual, with obs being a list. However, when trying to access input_dict[“obs”] inside my agent ray throws an error, as ray treats the whole tuple (obs, r, done, info) as the observation instead of just obs and subsequently tries to convert it to numpy and a torch tensor. This is with the IMPALATrainer. The error looks as follows:
ray.exceptions.RayTaskError(TypeError): ray::RolloutWorker.par_iter_next_batch() (pid=2562678, ip=10.244.28.80)
File "python/ray/_raylet.pyx", line 482, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 436, in ray._raylet.execute_task.function_executor
File "/usr/local/lib/python3.6/dist-packages/ray/util/iter.py", line 1158, in par_iter_next_batch
batch.append(self.par_iter_next())
File "/usr/local/lib/python3.6/dist-packages/ray/util/iter.py", line 1152, in par_iter_next
return next(self.local_it)
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/rollout_worker.py", line 317, in gen_rollouts
yield self.sample()
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/rollout_worker.py", line 621, in sample
batches = [self.input_reader.next()]
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/sampler.py", line 94, in next
batches = [self.get_data()]
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/sampler.py", line 211, in get_data
item = next(self.rollout_provider)
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/sampler.py", line 623, in _env_runner
tf_sess=tf_sess,
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/sampler.py", line 1236, in _do_policy_eval
timestep=policy.global_timestep)
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/policy/torch_policy.py", line 169, in compute_actions
input_dict, state_batches, seq_lens, explore, timestep)
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/policy/torch_policy.py", line 249, in _compute_action_helper
seq_lens)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/pvc/bwinter-core/medical_rl/Medical-RL/model.py", line 38, in forward
encoded = self.encoder(input_ids = input_dict["obs"][0])
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/utils/tracking_dict.py", line 30, in __getitem__
self.intercepted_values[key] = self.get_interceptor(value)
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/utils/torch_ops.py", line 71, in convert_to_torch_tensor
return tree.map_structure(mapping, x)
File "/usr/local/lib/python3.6/dist-packages/tree/__init__.py", line 510, in map_structure
[func(*args) for args in zip(*map(flatten, structures))])
File "/usr/local/lib/python3.6/dist-packages/tree/__init__.py", line 510, in <listcomp>
[func(*args) for args in zip(*map(flatten, structures))])
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/utils/torch_ops.py", line 65, in mapping
tensor = torch.from_numpy(np.asarray(item))
TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.
(pid=2562678) /usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
(pid=2562678) return array(a, dtype, copy=False, order=order)
(pid=2562678) InputDict: {'obs': array([[list([101, 10470, 11331, 102]), 0, False, {}],
(pid=2562678) [list([101, 7107, 10535, 102]), 0, False, {}],
(pid=2562678) [list([101, 21500, 3917, 102]), 0, False, {}],
(pid=2562678) [list([101, 15423, 17223, 102]), 0, False, {}]], dtype=object), 'is_training': False, 'prev_actions': array([0, 0, 0, 0]), 'prev_rewards': array([0., 0., 0., 0.])}
This is my custom observation space:
class CustomSpace(Space):
def __init__(self, length, vocab_size):
assert length > 0
assert vocab_size > 0
self.length = length
self.vocab_size = vocab_size
super(CustomSpace, self).__init__((), np.int64)
def sample(self):
return [ random.randint(0, self.vocab_size - 1) for _ in range(self.length)]
def contains(self, x):
if isinstance(x[0], list):
for i in x[0]:
if not isinstance(i, int) or (i < 0 or i > self.vocab_size):
return False
else:
return False
return True
Any help would be greatly appreciated.