- High: It blocks me to complete my task.
Hi, I’m new to OpenAI Gym and RLlib. SO my question may be dumb.
Recently I’m doing a multi-agent project and trying to convert the OpenAI Gym robotics environments (Fetch and handmanipulate) to the multiagent environment with the
make_multi_agent wrapper. I modified the simple example and here are my code:
import ray
from ray.rllib.agents.ddpg import DDPGTrainer
from ray.tune.registry import register_env
def env_creator(env_config):
ma_hand_cls = ray.rllib.env.multi_agent_env.make_multi_agent("HandManipulateBlock-v0")
ma_hand = ma_hand_cls({"num_agents": 2})
return ma_hand
register_env("ma_hand", env_creator)
# Configure the algorithm.
config = {
# Environment (RLlib understands openAI gym registered strings).
"env": "ma_hand",
# Use 2 environment workers (aka "rollout workers") that parallelly
# collect samples from their own environment clone(s).
"num_workers": 2,
# Change this to "framework: torch", if you are using PyTorch.
# Also, use "framework: tf2" for tf2.x eager execution.
"framework": "tf",
"render_env": True,
# Tweak the default model provided automatically by RLlib,
# given the environment's observation- and action spaces.
"model": {
"fcnet_hiddens": [64, 64],
"fcnet_activation": "relu",
},
# Set up a separate evaluation worker set for the
# `trainer.evaluate()` call after training (see below).
"evaluation_num_workers": 1,
# Only for evaluation runs, render the env.
"evaluation_config": {
"render_env": True,
},
#"disable_env_checking": True,
}
# Create our RLlib Trainer.
trainer = DDPGTrainer(config=config)
# Run it for n training iterations. A training iteration includes
# parallel sample collection by the environment workers as well as
# loss calculation on the collected batch and a model update.
for _ in range(3):
print(trainer.train())
# Evaluate the trained Trainer (and render each timestep to the shell's
# output).
trainer.evaluate()
When I try to create a trainer with the converted environment, it give this error:
“ValueError: The observation collected from env.reset was not contained within your env’s observation space. Its possible that there was a typemismatch (for example observations of np.float32 and a space of np.float64 observations), or that one of the sub-observations wasout of bounds“
I can bypass this error by setting “disable_env_checking=True“ in the config. But after training, the trainer.evaluate() can evaluate the trained policy, but the render is not working (no rendered window pop out). Here are the output of trainer.evaluate():
Out[20]:
{'evaluation': {'episode_reward_max': -100.0,
'episode_reward_min': -100.0,
'episode_reward_mean': -100.0,
'episode_len_mean': 50.0,
'episode_media': {},
'episodes_this_iter': 10,
'policy_reward_min': {},
'policy_reward_max': {},
'policy_reward_mean': {},
'custom_metrics': {},
'hist_stats': {'episode_reward': [-100.0,
-100.0,
-100.0,
-100.0,
-100.0,
-100.0,
-100.0,
-100.0,
-100.0,
-100.0],
'episode_lengths': [50, 50, 50, 50, 50, 50, 50, 50, 50, 50]},
'sampler_perf': {'mean_raw_obs_processing_ms': 0.09725146188945352,
'mean_inference_ms': 0.4013698258085879,
'mean_action_processing_ms': 0.0842750191450595,
'mean_env_wait_ms': 1.6527913525670825,
'mean_env_render_ms': 0.04739675693169325},
'off_policy_estimator': {},
'timesteps_this_iter': 0}}
Anu idea how to solve this problem? Thanks so much!