EnvCompatibility problem on the number of parameters returned by step

Hey community,

I have a problem with the EnvCompatibility in RLlib.
In fact, to truncate my episodes when beyond a certain number of steps, I put this in my __init__.py:

register(
    id="sar-v0.1",
    entry_point="sar.envs.sar_base:SarBase",
    max_episode_steps=300
)

In my environment, the function step returns self.obs, reward, done, False, info. This works perfectly for the truncation, and when 300 steps are reached the episode stops.

The problem I have with the train function in RLlib, that apparently does an “EnvCompatibility” check and results in such error:

2023-05-02 14:56:21,938	ERROR actor_manager.py:496 -- Ray error, taking actor 1 out of service. ray::RolloutWorker.apply() (pid=6410, ip=10.99.52.23, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f525f69c820>)
  File "/data/users/dahmadoun/conda-envs/gym_env/lib/python3.10/site-packages/ray/rllib/utils/actor_manager.py", line 183, in apply
    raise e
  File "/data/users/dahmadoun/conda-envs/gym_env/lib/python3.10/site-packages/ray/rllib/utils/actor_manager.py", line 174, in apply
    return func(self, *args, **kwargs)
  File "/data/users/dahmadoun/conda-envs/gym_env/lib/python3.10/site-packages/ray/rllib/execution/rollout_ops.py", line 86, in <lambda>
    lambda w: w.sample(), local_worker=False, healthy_only=True
  File "/data/users/dahmadoun/conda-envs/gym_env/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 914, in sample
    batches = [self.input_reader.next()]
  File "/data/users/dahmadoun/conda-envs/gym_env/lib/python3.10/site-packages/ray/rllib/evaluation/sampler.py", line 92, in next
    batches = [self.get_data()]
  File "/data/users/dahmadoun/conda-envs/gym_env/lib/python3.10/site-packages/ray/rllib/evaluation/sampler.py", line 277, in get_data
    item = next(self._env_runner)
  File "/data/users/dahmadoun/conda-envs/gym_env/lib/python3.10/site-packages/ray/rllib/evaluation/env_runner_v2.py", line 323, in run
    outputs = self.step()
  File "/data/users/dahmadoun/conda-envs/gym_env/lib/python3.10/site-packages/ray/rllib/evaluation/env_runner_v2.py", line 379, in step
    self._base_env.send_actions(actions_to_send)
  File "/data/users/dahmadoun/conda-envs/gym_env/lib/python3.10/site-packages/ray/rllib/env/vector_env.py", line 464, in send_actions
    ) = self.vector_env.vector_step(action_vector)
  File "/data/users/dahmadoun/conda-envs/gym_env/lib/python3.10/site-packages/ray/rllib/env/vector_env.py", line 360, in vector_step
    raise e
  File "/data/users/dahmadoun/conda-envs/gym_env/lib/python3.10/site-packages/ray/rllib/env/vector_env.py", line 353, in vector_step
    results = self.envs[i].step(actions[i])
  File "/data/users/dahmadoun/conda-envs/gym_env/lib/python3.10/site-packages/gymnasium/wrappers/compatibility.py", line 107, in step
    obs, reward, done, info = self.env.step(action)
ValueError: too many values to unpack (expected 4)

Do you have any idea how I should tell the Trainer that my function step is already under the new Gymnasium API ?

Thank you

Here are more details on what I understood about the problem.

To add the max_episode_steps and thus the truncation with the time limit wrapper, I had to add an element in the parameters the function step returns. And for the RLlib trainer, it supposes the environment is under the old API and considers that my step only returns 4 elements.

@Douae_Ahmadoun will ping the RLlib guys here @arturn

1 Like

Hi! What version of RLlib are you using?
If you use the Ray >= 2.3, RLlib should complain iff your env is not abiding to the Gymnasium API.

Hi @arturn, thank you for your answer

Indeed, I use the 2.3 RLlib and normally my environment is compatible with the new Gymnasium API (here, it’s kind of the opposite, my step returns 5 elements as in the new API, but the error indicates that it was expecting only 4 as in the old Gym).

I tried downgrading RLlib to the 2.2.0 version, but I had other errors on the build function.

I found the error I had.
This is the function I was using to register my environment and use it in RLlib.

def env_creator(env_config):
    env = SarBase()
    return EnvCompatibility(SarBase(env_config))

Since it is already compatible with the new API, it was useless (even problematic) to wrap it with EnvCompatibility, that was expecting an old API format (4 elements to return by step for example instead of 5).

Thank you for you help @arturn and @Jules_Damji

1 Like