Bug when converting GYM Robotics env to Multi-agent Env with the make_multi_agent wrapper

  • High: It blocks me to complete my task.

Hi, I’m new to OpenAI Gym and RLlib. SO my question may be dumb.
Recently I’m doing a multi-agent project and trying to convert the OpenAI Gym robotics environments (Fetch and handmanipulate) to the multiagent environment with the
make_multi_agent wrapper. I modified the simple example and here are my code:

import ray
from ray.rllib.agents.ddpg import DDPGTrainer
from ray.tune.registry import register_env

def env_creator(env_config):
    ma_hand_cls = ray.rllib.env.multi_agent_env.make_multi_agent("HandManipulateBlock-v0")
    ma_hand = ma_hand_cls({"num_agents": 2})
    return ma_hand 
register_env("ma_hand", env_creator)

# Configure the algorithm.
config = {
    # Environment (RLlib understands openAI gym registered strings).
    "env": "ma_hand",
    # Use 2 environment workers (aka "rollout workers") that parallelly
    # collect samples from their own environment clone(s).
    "num_workers": 2,
    # Change this to "framework: torch", if you are using PyTorch.
    # Also, use "framework: tf2" for tf2.x eager execution.
    "framework": "tf",
    "render_env": True,
    # Tweak the default model provided automatically by RLlib,
    # given the environment's observation- and action spaces.
    "model": {
        "fcnet_hiddens": [64, 64],
        "fcnet_activation": "relu",
    },
    # Set up a separate evaluation worker set for the
    # `trainer.evaluate()` call after training (see below).
    "evaluation_num_workers": 1,
    # Only for evaluation runs, render the env.
    "evaluation_config": {
        "render_env": True,
    },
    #"disable_env_checking": True,
}

# Create our RLlib Trainer.
trainer = DDPGTrainer(config=config)

# Run it for n training iterations. A training iteration includes
# parallel sample collection by the environment workers as well as
# loss calculation on the collected batch and a model update.
for _ in range(3):
    print(trainer.train())

# Evaluate the trained Trainer (and render each timestep to the shell's
# output).
trainer.evaluate()

When I try to create a trainer with the converted environment, it give this error:
“ValueError: The observation collected from env.reset was not contained within your env’s observation space. Its possible that there was a typemismatch (for example observations of np.float32 and a space of np.float64 observations), or that one of the sub-observations wasout of bounds“

I can bypass this error by setting “disable_env_checking=True“ in the config. But after training, the trainer.evaluate() can evaluate the trained policy, but the render is not working (no rendered window pop out). Here are the output of trainer.evaluate():

Out[20]: 
{'evaluation': {'episode_reward_max': -100.0,
  'episode_reward_min': -100.0,
  'episode_reward_mean': -100.0,
  'episode_len_mean': 50.0,
  'episode_media': {},
  'episodes_this_iter': 10,
  'policy_reward_min': {},
  'policy_reward_max': {},
  'policy_reward_mean': {},
  'custom_metrics': {},
  'hist_stats': {'episode_reward': [-100.0,
    -100.0,
    -100.0,
    -100.0,
    -100.0,
    -100.0,
    -100.0,
    -100.0,
    -100.0,
    -100.0],
   'episode_lengths': [50, 50, 50, 50, 50, 50, 50, 50, 50, 50]},
  'sampler_perf': {'mean_raw_obs_processing_ms': 0.09725146188945352,
   'mean_inference_ms': 0.4013698258085879,
   'mean_action_processing_ms': 0.0842750191450595,
   'mean_env_wait_ms': 1.6527913525670825,
   'mean_env_render_ms': 0.04739675693169325},
  'off_policy_estimator': {},
  'timesteps_this_iter': 0}}

Anu idea how to solve this problem? Thanks so much!