- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
It prevents me from migrating from Ray 2.1.0 to 2.2.0
I have a custom environment and wanted to migrate to Ray 2.2.0. However, when I started training an Impala agent in rllib i noticed that the result[“episode_reward_mean”] kept on being “nan”.
After replacing my custom model and environment with the default fc-model and “CartPole-v1” / “MountainCar-v0” it was clear that the cause of this issue had to lay with my custom environment. Hence, I tried to replicate the example from the documentation/examples found here
However, this would not run. To debug I tried to reduce it to a minimal rllib setup with a default model (tf) but kept on getting these errors:
from /ray/rllib/utils/pre_checks/env.py", line 68, in check_env:
ValueError: Env must be of one of the following supported types: BaseEnv, gym.Env, MultiAgentEnv, VectorEnv, RemoteBaseEnv, ExternalMultiAgentEnv, ExternalEnv, but instead is of type <class ‘main.SimpleCorridor’>.
or if setting disable_env_checking=True:
from /ray/rllib/algorithms/algorithm_config.py", line 2182, in get_multi_agent_setup:
ValueError: observation_space
not provided in PolicySpec for default_policy and env does not have an observation space OR no spaces received from other workers’ env(s) OR no observation_space
specified in config!
Versions:
Python: 3.9.16
Ray: 2.2.0
Tensorflow: 2.11.0
Gymnasium: 0.27.1
Gym: 0.23.1 (had to install this as well as some rllib files caused import errors without)
The code below should replicate the problem (comment in and out to get different errors)
Can anybody tell me what I’m doing wrong?
BR
Jorgen
import gymnasium as gym
from gymnasium.spaces import Discrete, Box
import numpy as np
import random
# from ray.rllib.env.env_context import EnvContext
# from ray.tune.registry import register_env
import ray.rllib.algorithms.impala as impala
from ray.tune.logger import pretty_print
class SimpleCorridor(gym.Env):
"""Example of a custom env in which you have to walk down a corridor.
You can configure the length of the corridor via the env config."""
# def __init__(self, config: EnvContext):
def __init__(self, config):
# super(SimpleCorridor, self).__init__()
self.end_pos = config["corridor_length"]
self.cur_pos = 0
self.action_space = Discrete(2)
self.observation_space = Box(0.0, self.end_pos, shape=(1,), dtype=np.float32)
# Set the seed. This is only used for the final (reach goal) reward.
self.reset(seed=config.worker_index * config.num_workers)
def reset(self, *, seed=None, options=None):
random.seed(seed)
self.cur_pos = 0
return [self.cur_pos], {}
def step(self, action):
assert action in [0, 1], action
if action == 0 and self.cur_pos > 0:
self.cur_pos -= 1
elif action == 1:
self.cur_pos += 1
done = truncated = self.cur_pos >= self.end_pos
# Produce a random reward when we reach the goal.
return (
[self.cur_pos],
random.random() * 2 if done else -0.1,
done,
truncated,
{},
)
# def env_creator(env_config):
# return SimpleCorridor(env_config)
# register_env("simple_corridor", env_creator)
algo = (
impala.ImpalaConfig()
.rollouts(num_rollout_workers=0)
.resources(num_gpus=0)
.environment(env=SimpleCorridor,env_config={"corridor_length": 5} )
# .environment(env="simple_corridor",env_config={"corridor_length": 5},
# #disable_env_checking=True
# )
.build()
)
for i in range(10):
result = algo.train()
print(pretty_print(result))
if i % 5 == 0:
checkpoint_dir = algo.save("./ray_22_test")
print(f"Checkpoint saved in directory {checkpoint_dir}")