Env not recognized when used with Tuner.restore

dylan906 · November 27, 2023, 4:39pm

When I try to restore an incomplete tune job using this script:

"""Resume experiment script."""
# %% Imports
import os
import ray
from ray.rllib.models import ModelCatalog
from ray.tune import Tuner
from ray.tune.registry import register_env
from punchclock.nets.lstm_mask import MaskedLSTM
from punchclock.ray.build_env import buildEnv

# %% Register model and Env
ModelCatalog.register_custom_model("MaskedLSTM", MaskedLSTM)
register_env("ssa_env", buildEnv)

checkpoint_dir = "/home/user/ray_results/exp_name"

num_cpus = 20
num_workers = num_cpus - 1

ray.init(num_cpus=num_cpus, num_gpus=0)
os.environ["TUNE_MAX_PENDING_TRIALS_PG"] = str(num_workers)

tuner = Tuner.restore(
    trainable="PPO",
    path=checkpoint_dir,
    resume_errored=True,
    restart_errored=True,
)
tuner.fit()

I get the following error, which indicates that the environment is not recognized.But in the above script, I register the environment using ray.tune.registry.register_env, so I don’t know why this error would occur. There is also a second failture that I am unsure is related to the environment error.

Failure # 1 (occurred at 2023-11-20_20-39-26)
The actor died because of an error raised in its creation task, e[36mray::PPO.__init__()e[39m (pid=130077, ip=10.128.8.91, actor_id=1969b2ae33b28f9c1b1fa87701000000, repr=PPO)
  File "/home/user/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py", line 242, in _setup
    self.add_workers(
  File "/home/user/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py", line 635, in add_workers
    raise result.get()
  File "/home/user/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/utils/actor_manager.py", line 488, in __fetch_result
    result = ray.get(r)
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, e[36mray::RolloutWorker.__init__()e[39m (pid=131032, ip=10.128.8.91, actor_id=73949f41d65591a253439f3e01000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x2abf619836d0>)
  File "/home/user/.conda/envs/punch/lib/python3.10/site-packages/gymnasium/envs/registration.py", line 569, in make
    _check_version_exists(ns, name, version)
  File "/home/user/.conda/envs/punch/lib/python3.10/site-packages/gymnasium/envs/registration.py", line 219, in _check_version_exists
    _check_name_exists(ns, name)
  File "/home/user/.conda/envs/punch/lib/python3.10/site-packages/gymnasium/envs/registration.py", line 197, in _check_name_exists
    raise error.NameNotFound(
gymnasium.error.NameNotFound: Environment ssa_env doesn't exist. 

During handling of the above exception, another exception occurred:

e[36mray::RolloutWorker.__init__()e[39m (pid=131032, ip=10.128.8.91, actor_id=73949f41d65591a253439f3e01000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x2abf619836d0>)
  File "/home/user/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 609, in __init__
    self.env = env_creator(copy.deepcopy(self.env_context))
  File "/home/user/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/env/utils.py", line 178, in _gym_env_creator
    raise EnvError(ERR_MSG_INVALID_ENV_DESCRIPTOR.format(env_descriptor))
ray.rllib.utils.error.EnvError: The env string you provided ('ssa_env') is:
a) Not a supported/installed environment.
b) Not a tune-registered environment creator.
c) Not a valid env class string.

Try one of the following:
a) For Atari support: `pip install gym[atari] autorom[accept-rom-license]`.
   For VizDoom support: Install VizDoom
   (https://github.com/mwydmuch/ViZDoom/blob/master/doc/Building.md) and
   `pip install vizdoomgym`.
   For PyBullet support: `pip install pybullet`.
b) To register your custom env, do `from ray import tune;
   tune.register('[name]', lambda cfg: [return env obj from here using cfg])`.
   Then in your config, do `config['env'] = [name]`.
c) Make sure you provide a fully qualified classpath, e.g.:
   `ray.rllib.examples.env.repeat_after_me_env.RepeatAfterMeEnv`

During handling of the above exception, another exception occurred:

e[36mray::PPO.__init__()e[39m (pid=130077, ip=10.128.8.91, actor_id=1969b2ae33b28f9c1b1fa87701000000, repr=PPO)
  File "/home/user/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 475, in __init__
    super().__init__(
  File "/home/user/.conda/envs/punch/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 170, in __init__
    self.setup(copy.deepcopy(self.config))
  File "/home/user/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 601, in setup
    self.workers = WorkerSet(
  File "/home/user/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py", line 194, in __init__
    raise e.args[0].args[2]
ray.rllib.utils.error.EnvError: The env string you provided ('ssa_env') is:
a) Not a supported/installed environment.
b) Not a tune-registered environment creator.
c) Not a valid env class string.

Try one of the following:
a) For Atari support: `pip install gym[atari] autorom[accept-rom-license]`.
   For VizDoom support: Install VizDoom
   (https://github.com/mwydmuch/ViZDoom/blob/master/doc/Building.md) and
   `pip install vizdoomgym`.
   For PyBullet support: `pip install pybullet`.
b) To register your custom env, do `from ray import tune;
   tune.register('[name]', lambda cfg: [return env obj from here using cfg])`.
   Then in your config, do `config['env'] = [name]`.
c) Make sure you provide a fully qualified classpath, e.g.:
   `ray.rllib.examples.env.repeat_after_me_env.RepeatAfterMeEnv`
Failure # 2 (occurred at 2023-11-25_19-33-43)
e[36mray::PPO.train()e[39m (pid=217206, ip=10.128.8.112, actor_id=67791a70455479e668d7bad601000000, repr=PPO)
  File "/home/user/.conda/envs/punch/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 389, in train
    raise skipped from exception_cause(skipped)
  File "/home/user/.conda/envs/punch/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 386, in train
    result = self.step()
  File "/home/user/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 832, in step
    results = self._compile_iteration_results(
  File "/home/user/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 3046, in _compile_iteration_results
    results["sampler_results"] = summarize_episodes(
  File "/home/user/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/evaluation/metrics.py", line 221, in summarize_episodes
    filt = [v for v in v_list if not np.any(np.isnan(v))]
  File "/home/user/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/evaluation/metrics.py", line 221, in <listcomp>
    filt = [v for v in v_list if not np.any(np.isnan(v))]
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

The custom environment works normally if I am initializing a new tuning run; I only see this problem when attempting to restore an experiment from a checkpoint.

If anyone has any ideas on where to look, I’d much appreciate it.

Topic		Replies	Views
Load checkpoint from tune experiment when using custom environment not registered RLlib	7	264	September 7, 2023
Ray Arcade Learning Environment not Recognized by Ray RLlib	0	204	November 21, 2023
Tuner cannot restore the checkpoints! Ray Tune	10	881	November 20, 2023
Unable to restore fully trained checkpoint RLlib	19	2869	October 21, 2023
Not able to resume experiment Ray Tune	5	940	December 12, 2022

Env not recognized when used with Tuner.restore

Related topics