Error: Custom observation Space not treated correctly

Hello,

i am new to rllib and have a custom gym environment with a custom Observation Space that is essentially a list of integers (Couldve also used MultiDiscrete for this but according to docs MultiDiscrete gets One-hot encoded which I don’t want.).

So my environment’s step function returns obs, r, done, info as is usual, with obs being a list. However, when trying to access input_dict[“obs”] inside my agent ray throws an error, as ray treats the whole tuple (obs, r, done, info) as the observation instead of just obs and subsequently tries to convert it to numpy and a torch tensor. This is with the IMPALATrainer. The error looks as follows:

ray.exceptions.RayTaskError(TypeError): ray::RolloutWorker.par_iter_next_batch() (pid=2562678, ip=10.244.28.80)
  File "python/ray/_raylet.pyx", line 482, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 436, in ray._raylet.execute_task.function_executor
  File "/usr/local/lib/python3.6/dist-packages/ray/util/iter.py", line 1158, in par_iter_next_batch
    batch.append(self.par_iter_next())
  File "/usr/local/lib/python3.6/dist-packages/ray/util/iter.py", line 1152, in par_iter_next
    return next(self.local_it)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/rollout_worker.py", line 317, in gen_rollouts
    yield self.sample()
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/rollout_worker.py", line 621, in sample
    batches = [self.input_reader.next()]
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/sampler.py", line 94, in next
    batches = [self.get_data()]
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/sampler.py", line 211, in get_data
    item = next(self.rollout_provider)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/sampler.py", line 623, in _env_runner
    tf_sess=tf_sess,
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/sampler.py", line 1236, in _do_policy_eval
    timestep=policy.global_timestep)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/policy/torch_policy.py", line 169, in compute_actions
    input_dict, state_batches, seq_lens, explore, timestep)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/policy/torch_policy.py", line 249, in _compute_action_helper
    seq_lens)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/pvc/bwinter-core/medical_rl/Medical-RL/model.py", line 38, in forward
    encoded = self.encoder(input_ids = input_dict["obs"][0])
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/utils/tracking_dict.py", line 30, in __getitem__
    self.intercepted_values[key] = self.get_interceptor(value)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/utils/torch_ops.py", line 71, in convert_to_torch_tensor
    return tree.map_structure(mapping, x)
  File "/usr/local/lib/python3.6/dist-packages/tree/__init__.py", line 510, in map_structure
    [func(*args) for args in zip(*map(flatten, structures))])
  File "/usr/local/lib/python3.6/dist-packages/tree/__init__.py", line 510, in <listcomp>
    [func(*args) for args in zip(*map(flatten, structures))])
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/utils/torch_ops.py", line 65, in mapping
    tensor = torch.from_numpy(np.asarray(item))
TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.
(pid=2562678) /usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
(pid=2562678)   return array(a, dtype, copy=False, order=order)
(pid=2562678) InputDict:  {'obs': array([[list([101, 10470, 11331, 102]), 0, False, {}],
(pid=2562678)        [list([101, 7107, 10535, 102]), 0, False, {}],
(pid=2562678)        [list([101, 21500, 3917, 102]), 0, False, {}],
(pid=2562678)        [list([101, 15423, 17223, 102]), 0, False, {}]], dtype=object), 'is_training': False, 'prev_actions': array([0, 0, 0, 0]), 'prev_rewards': array([0., 0., 0., 0.])}

This is my custom observation space:

class CustomSpace(Space):
  def __init__(self, length, vocab_size):
    assert length > 0
    assert vocab_size > 0
    self.length = length
    self.vocab_size = vocab_size
    super(CustomSpace, self).__init__((), np.int64)

  def sample(self):
    return [ random.randint(0, self.vocab_size - 1) for _ in range(self.length)]

  def contains(self, x):
    if isinstance(x[0], list):
      for i in x[0]:
        if not isinstance(i, int) or (i < 0 or i > self.vocab_size):
          return False
    else:
      return False
    return True

Any help would be greatly appreciated.

Can you share the custom environment? I do not see an obvious issue here.

yes, forgot that. Here is the environment:

class DoctorSim(gym.Env):
  metadata = {'render.modes': ['human']}

  def __init__(self, max_episode_steps=100, observation_length=512, vocab_size=200):
    self.observation_space = CustomSpace(observation_length, vocab_size)
    self.action_space = spaces.Discrete(5)
    self.current_step = 0

  def step(self, action):
    obs = [random.randint(0, self.obvervation_space.vocab_size - 1) for _ in range(self.observation_space.length)]
    r = -self.current_step
    self.current_step += 1
    done = False
    info = {}
    return obs, r, done, info

  def reset(self):
    self.current_step = 0
    return self.step(-1)

  def render(self, mode='human'):
    ...

  def close(self):
    ...

Its just rudimentary boilerplate for now generating random observations so i can get everything to work together.

I also upgraded to ray 1.4.1 and now the problem presents itself differently:
input dict now looks like this:

SampleBatch(['obs', 'new_obs', 'actions', 'prev_actions', 'rewards', 'prev_rewards', 'dones', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't'])

and input_dict[“obs”] is now a tensor. But the tensor contains only zeroes instead of the actual data sent by the gym and is of shape [32] which doesnt make sense to me. With parameters observation length 128, num_workers=1, num_envs_per_worker=1 and rollout_fragment_length=16 i would expect a tensor of shape (16, 128).
Could it maybe have something to do with the .view_requirements of my model? I have no idea how to specify it in this case and just left it at default

{"obs": ViewRequirement(shift=0)}

The environment is in my last reply which is unfortunately still hidden because of the spam bot. There have been some additional developments though: The dummyInputs look somewhat correct now after i upgraded to ray 1.4.1, with obs being a tensor of shape(x, obs length) (i think the x is still wrong but thats a different problem) and everything works fine. However once the actual training start its back to being

(pid=10906) Input dict: SampleBatch(['obs'])
(pid=10906) Input obs: [[list([101, 22085, 8415, 22104, 20306, 2121, 6220, 1005, 1055, 24820, 2015, 5843, 3085, 4875, 4095, 102])
(pid=10906)   0 False {}]]

so it thinks the observation is all 4 env outputs together. How could the dummyinputs and real inputs be that different?

I close this thread in shame as I now realize the Gym environment’s reset function is supposed to return obs only, and not obs, reward, done, info like the step function. :expressionless:

@BenjaminWinter not too much shame. This kind of thing happens to me all the time.