Observation_space not provided in PolicySpec

I don’t know why I am getting this error when running Tune with IMPALA with single agent custom env, if I run the trainer without tune, it runs for a few minutes then it crashes.

ValueError: observation_space not provided in PolicySpec for default_policy and env does not have an observation space OR no spaces received from other workers’ env(s) OR no observation_space specified in config!

I am using Python 3.9, Windows 11 (on linux I get the same error), ray 1.13.0
Hardware AMD 16-core, 128GB RAM, RTX 3060 12GB

cfg = impala.DEFAULT_CONFIG.copy()
cfg[“num_gpus”] = 1
cfg[“num_workers”] = 10
cfg[“num_envs_per_worker”] = 10
cfg[“framework”] = “torch”
cfg[“horizon”] = 750
cfg[“model”] = {“fcnet_hiddens”: [512, 512],}

tune.run(“IMPALA”, “run”, config=cfg, verbose=1)

  • High: It blocks me to complete my task.

@evoyan you can add “observation_space” in init method of your environment.
Here is the page https://alexandervandekleut.github.io/gym-wrappers/ how to use wrapper.

You can define observation space with the Box, of course You have to define proper shape:

self.observation_space = Box(0, 2, (self.max_avail_actions,), ) 

I hope it will help You.

I already have an observation_space in init

self.observation_space = spaces.Box(low=-np.inf, high=np.inf, shape=(42410,), dtype=np.float32)

You can check if your observations have type np.float32

This is how I prepare the data for obs, I hope it’s ok

obs = np.concatenate((self.data1[self.index],
self.data2
), axis=0).astype(np.float32)

return obs

my environment is inherited from gym.Env instead of gym.Wrapper, is that a problem?

You can add observation space in Your configuration:
https://docs.ray.io/en/latest/rllib/rllib-training.html#common-parameters
Yoy can try with parameter:
"observation_space": None,

I have added cfg[“observation_space”] = None
but I get the same error (I assume that None is default)

Can you copy and paste error log?

Here it is a log I tried with PPO and 1 worker/1 env, but got the same error

2022-06-14 18:45:05,932 ERROR syncer.py:147 -- Log sync requires rsync to be installed.
(PPOTrainer pid=30456) 2022-06-14 18:45:09,680  INFO ppo.py:414 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.
(PPOTrainer pid=30456) 2022-06-14 18:45:09,680  INFO trainer.py:903 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
== Status ==
Current time: 2022-06-14 18:45:13 (running for 00:00:07.80)
Memory usage on this node: 15.9/95.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/32 CPUs, 1.0/1 GPUs, 0.0/49.48 GiB heap, 0.0/24.74 GiB objects
Result logdir: C:\Users\m1\ray_results\run
Number of trials: 1/1 (1 RUNNING)


2022-06-14 18:45:13,587 ERROR trial_runner.py:886 -- Trial PPO_None_f6442_00000: Error processing event.
NoneType: None
== Status ==
Current time: 2022-06-14 18:45:13 (running for 00:00:07.80)
Memory usage on this node: 15.9/95.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/32 CPUs, 0/1 GPUs, 0.0/49.48 GiB heap, 0.0/24.74 GiB objects
Result logdir: C:\Users\m1\ray_results\run
Number of trials: 1/1 (1 ERROR)
Number of errored trials: 1
+----------------------+--------------+-----------------------------------------------------------------------------------+
| Trial name           |   # failures | error file                                                                   
     |
|----------------------+--------------+-----------------------------------------------------------------------------------|
| PPO_None_f6442_00000 |            1 | C:\Users\m1\ray_results\run\PPO_None_f6442_00000_0_2022-06-14_18-45-05\error.txt |
+----------------------+--------------+-----------------------------------------------------------------------------------+

== Status ==
Current time: 2022-06-14 18:45:13 (running for 00:00:07.80)
Memory usage on this node: 15.9/95.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/32 CPUs, 0/1 GPUs, 0.0/49.48 GiB heap, 0.0/24.74 GiB objects
Result logdir: C:\Users\m1\ray_results\run
Number of trials: 1/1 (1 ERROR)
Number of errored trials: 1
+----------------------+--------------+-----------------------------------------------------------------------------------+
| Trial name           |   # failures | error file                                                                   
     |
|----------------------+--------------+-----------------------------------------------------------------------------------|
| PPO_None_f6442_00000 |            1 | C:\Users\m1\ray_results\run\PPO_None_f6442_00000_0_2022-06-14_18-45-05\error.txt |
+----------------------+--------------+-----------------------------------------------------------------------------------+

2022-06-14 18:45:13,594 ERROR ray_trial_executor.py:107 -- An exception occurred when trying to stop the Ray actor:Traceback (most recent call last):
  File "D:\Proj\.env\lib\site-packages\ray\tune\ray_trial_executor.py", line 98, in post_stop_cleanup
    ray.get(future, timeout=0)
  File "D:\Proj\.env\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper  
    return func(*args, **kwargs)
  File "D:\Proj\.env\lib\site-packages\ray\worker.py", line 1833, in get
    raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::PPOTrainer.__init__() (pid=30456, ip=127.0.0.1, repr=PPOTrainer)
  File "D:\Proj\.env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462, in _resume_span
    return method(self, *_args, **_kwargs)
  File "D:\Proj\.env\lib\site-packages\ray\rllib\agents\trainer.py", line 1074, in _init        
    raise NotImplementedError
NotImplementedError

During handling of the above exception, another exception occurred:

ray::PPOTrainer.__init__() (pid=30456, ip=127.0.0.1, repr=PPOTrainer)
  File "python\ray\_raylet.pyx", line 658, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 699, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 665, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 669, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 616, in ray._raylet.execute_task.function_executor
  File "D:\Proj\.env\lib\site-packages\ray\_private\function_manager.py", line 675, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "D:\Proj\.env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462, in _resume_span
    return method(self, *_args, **_kwargs)
  File "D:\Proj\.env\lib\site-packages\ray\rllib\agents\trainer.py", line 870, in __init__      
    super().__init__(
  File "D:\Proj\.env\lib\site-packages\ray\tune\trainable.py", line 156, in __init__
    self.setup(copy.deepcopy(self.config))
  File "D:\Proj\.env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462, in _resume_span
    return method(self, *_args, **_kwargs)
  File "D:\Proj\.env\lib\site-packages\ray\rllib\agents\trainer.py", line 950, in setup
    self.workers = WorkerSet(
  File "D:\Proj\.env\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 142, in __init__
    remote_spaces = ray.get(
  File "D:\Proj\.env\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper  
    return func(*args, **kwargs)
  File "D:\Proj\.env\lib\site-packages\ray\worker.py", line 1833, in get
    raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=31592, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000002E1FEE24B50>)
  File "python\ray\_raylet.pyx", line 665, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 669, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 616, in ray._raylet.execute_task.function_executor
  File "D:\Proj\.env\lib\site-packages\ray\_private\function_manager.py", line 675, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "D:\Proj\.env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462, in _resume_span
    return method(self, *_args, **_kwargs)
  File "D:\Proj\.env\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 573, in __init__
    self.policy_dict = _determine_spaces_for_multi_agent_dict(
  File "D:\Proj\.env\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1937, in _determine_spaces_for_multi_agent_dict
    raise ValueError(
ValueError: `observation_space` not provided in PolicySpec for default_policy and env does not have an observation space OR no spaces received from other workers' env(s) OR no `observation_space` specified in config!

(PPOTrainer pid=30456) 2022-06-14 18:45:13,583  ERROR worker.py:451 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::PPOTrainer.__init__() (pid=30456, ip=127.0.0.1, repr=PPOTrainer)
(PPOTrainer pid=30456)   File "D:\Proj\.env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462, in _resume_span
(PPOTrainer pid=30456)     return method(self, *_args, **_kwargs)
(PPOTrainer pid=30456)   File "D:\Proj\.env\lib\site-packages\ray\rllib\agents\trainer.py", line 1074, in _init
(PPOTrainer pid=30456)     raise NotImplementedError
(PPOTrainer pid=30456) NotImplementedError
(PPOTrainer pid=30456)
(PPOTrainer pid=30456) During handling of the above exception, another exception occurred:
(PPOTrainer pid=30456)
(PPOTrainer pid=30456) ray::PPOTrainer.__init__() (pid=30456, ip=127.0.0.1, repr=PPOTrainer)
(PPOTrainer pid=30456)   File "python\ray\_raylet.pyx", line 658, in ray._raylet.execute_task
(PPOTrainer pid=30456)   File "python\ray\_raylet.pyx", line 699, in ray._raylet.execute_task
(PPOTrainer pid=30456)   File "python\ray\_raylet.pyx", line 665, in ray._raylet.execute_task
(PPOTrainer pid=30456)   File "python\ray\_raylet.pyx", line 669, in ray._raylet.execute_task
(PPOTrainer pid=30456)   File "python\ray\_raylet.pyx", line 616, in ray._raylet.execute_task.function_executor      
(PPOTrainer pid=30456)   File "D:\Proj\.env\lib\site-packages\ray\_private\function_manager.py", line 675, in actor_method_executor
(PPOTrainer pid=30456)     return method(__ray_actor, *args, **kwargs)
(PPOTrainer pid=30456)   File "D:\Proj\.env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462, in _resume_span
(PPOTrainer pid=30456)     return method(self, *_args, **_kwargs)
(PPOTrainer pid=30456)   File "D:\Proj\.env\lib\site-packages\ray\rllib\agents\trainer.py", line 870, in __init__
(PPOTrainer pid=30456)     super().__init__(
(PPOTrainer pid=30456)   File "D:\Proj\.env\lib\site-packages\ray\tune\trainable.py", line 156, in __init__
(PPOTrainer pid=30456)     self.setup(copy.deepcopy(self.config))
(PPOTrainer pid=30456)   File "D:\Proj\.env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462, in _resume_span
(PPOTrainer pid=30456)     return method(self, *_args, **_kwargs)
(PPOTrainer pid=30456)   File "D:\Proj\.env\lib\site-packages\ray\rllib\agents\trainer.py", line 950, in setup
(PPOTrainer pid=30456)     self.workers = WorkerSet(
(PPOTrainer pid=30456)   File "D:\Proj\.env\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 142, in __init__
(PPOTrainer pid=30456)     remote_spaces = ray.get(
(PPOTrainer pid=30456)   File "D:\Proj\.env\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
(PPOTrainer pid=30456)     return func(*args, **kwargs)
(PPOTrainer pid=30456)   File "D:\Proj\.env\lib\site-packages\ray\worker.py", line 1833, in get 
Traceback (most recent call last):
  File "D:\Proj\kub_IMPALA.py", line 93, in <module>
(PPOTrainer pid=30456)     raise value
result = tune.run("PPO", "run", config=cfg, verbose=1)
  File "D:\Proj\.env\lib\site-packages\ray\tune\tune.py", line 741, in run
(PPOTrainer pid=30456)     ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, raise TuneError("Trials did not complete", incomplete_trials)
ray::RolloutWorker.__init__()ray.tune.error. (pid=31592, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000002E1FEE24B50>)TuneError
: ('Trials did not complete', [PPO_None_f6442_00000])
(PPOTrainer pid=30456)   File "python\ray\_raylet.pyx", line 665, in ray._raylet.execute_task
(PPOTrainer pid=30456)   File "python\ray\_raylet.pyx", line 669, in ray._raylet.execute_task
(PPOTrainer pid=30456)   File "python\ray\_raylet.pyx", line 616, in ray._raylet.execute_task.function_executor      
(PPOTrainer pid=30456)   File "D:\Proj\.env\lib\site-packages\ray\_private\function_manager.py", line 675, in actor_method_executor
(PPOTrainer pid=30456)     return method(__ray_actor, *args, **kwargs)
(PPOTrainer pid=30456)   File "D:\Proj\.env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462, in _resume_span
(PPOTrainer pid=30456)     return method(self, *_args, **_kwargs)
(PPOTrainer pid=30456)   File "D:\Proj\.env\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 573, in __init__
(PPOTrainer pid=30456)     self.policy_dict = _determine_spaces_for_multi_agent_dict(
(PPOTrainer pid=30456)   File "D:\Proj\.env\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1937, in _determine_spaces_for_multi_agent_dict
(PPOTrainer pid=30456)     raise ValueError(
(PPOTrainer pid=30456) ValueError: `observation_space` not provided in PolicySpec for default_policy and env does not have an observation space OR no spaces received from other workers' env(s) OR no `observation_space` specified in config!
(RolloutWorker pid=31592) 2022-06-14 18:45:13,577       ERROR worker.py:451 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=31592, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000002E1FEE24B50>)
(RolloutWorker pid=31592)   File "python\ray\_raylet.pyx", line 665, in ray._raylet.execute_task
(RolloutWorker pid=31592)   File "python\ray\_raylet.pyx", line 669, in ray._raylet.execute_task
(RolloutWorker pid=31592)   File "python\ray\_raylet.pyx", line 616, in ray._raylet.execute_task.function_executor   
(RolloutWorker pid=31592)   File "D:\Proj\.env\lib\site-packages\ray\_private\function_manager.py", line 675, in actor_method_executor
(RolloutWorker pid=31592)     return method(__ray_actor, *args, **kwargs)
(RolloutWorker pid=31592)   File "D:\Proj\.env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462, in _resume_span
(RolloutWorker pid=31592)     return method(self, *_args, **_kwargs)
(RolloutWorker pid=31592)   File "D:\Proj\.env\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 573, in __init__
(RolloutWorker pid=31592)     self.policy_dict = _determine_spaces_for_multi_agent_dict(
(RolloutWorker pid=31592)   File "D:\Proj\.env\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1937, in _determine_spaces_for_multi_agent_dict
(RolloutWorker pid=31592)     raise ValueError(
(RolloutWorker pid=31592) ValueError: `observation_space` not provided in PolicySpec for default_policy and env does not have an observation space OR no spaces received from other workers' env(s) OR no `observation_space` specified in config!
(pid=) 2022-06-14 18:45:14,113  INFO context.py:67 -- Exec'ing worker with command: "D:\Proj\.env\Scripts\python.exe" D:\Proj\.env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=57762 --object-store-name=tcp://127.0.0.1:64785 --raylet-name=tcp://127.0.0.1:58844 --redis-address=None --storage=None --temp-dir=C:\Users\m1\AppData\Local\Temp\ray --metrics-agent-port=59156 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:59770 --redis-password=5241590000000000 --startup-token=32 --runtime-env-hash=185949076

What is strange that the environment is working, with trainer = impala.ImpalaTrainer(env=“my_env”, config=cfg) and fewer workers I was able to get about 7mil episodes, running ray 1.9.2, after I have upgraded to ray 1.11.0 to 1.13.0 it started to crash sooner and with tune it seems that it does not run at all.

maybe try config:

cfg[“observation_space”]=spaces.Box(low=-np.inf, high=np.inf, shape=(42410,), dtype=np.float32)

You can change np.inf to some definer value

I get the same error with cfg and also with np.inf replaced with value

I have no new ideas now :neutral_face:

Here is an empty environment with which I get the same error

import random
import gym
from gym import spaces
import numpy as np
import random

class MyEnv(gym.Env):
    def __init__(self, config=None):
        super(MyEnv, self).__init__()        

        self.action_space = spaces.Box(
            low=-1, high=1, shape=(2,), dtype=np.float32)
        self.observation_space = spaces.Box(low=-np.inf, high=np.inf, shape=(42410,), dtype=np.float32)
              
    def _next_observation(self):
      obs = np.random.rand(42410)
      return obs

    def _take_action(self, action):
      self._reward = 1

    def step(self, action):        
        self._reward = 0
        self._take_action(action)        
        done = False        
        obs = self._next_observation()
        return obs, self._reward, done, {}

    def reset(self):
        self._reward = 0
        self.total_reward = 0       
        self.visualization = None
        return self._next_observation()

@evo11x,

There must be another detail of your setup that is causing the error. Here is a colab link showing a very minimal example of the environment you provided running fine. Perhaps you could try to add your other customizations and see which breaks it.

1 Like

I changed to gpu, I dont’t have any error.

022-06-14 18:09:51,271	INFO services.py:1476 -- View the Ray dashboard at http://127.0.0.1:8265
(PPOTrainer pid=431) 2022-06-14 18:10:04,122	INFO trainer.py:2333 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
(PPOTrainer pid=431) 2022-06-14 18:10:04,123	WARNING ppo.py:395 -- `train_batch_size` (256) cannot be achieved with your other settings (num_workers=1 num_envs_per_worker=1 rollout_fragment_length=10)! Auto-adjusting `rollout_fragment_length` to 256.
(PPOTrainer pid=431) 2022-06-14 18:10:04,123	INFO ppo.py:415 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.
(PPOTrainer pid=431) 2022-06-14 18:10:04,123	INFO trainer.py:906 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(RolloutWorker pid=474) 2022-06-14 18:10:08,937	WARNING env.py:136 -- Your env doesn't have a .spec.max_episode_steps attribute. This is fine if you have set 'horizon' in your config dictionary, or `soft_horizon`. However, if you haven't, 'horizon' will default to infinity, and your environment will not be reset.
(PPOTrainer pid=431) 2022-06-14 18:10:16,952	INFO trainable.py:163 -- Trainable.setup took 12.831 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
(PPOTrainer pid=431) 2022-06-14 18:10:16,952	WARNING util.py:65 -- Install gputil for GPU system monitoring.

== Status ==
Current time: 2022-06-14 18:10:16 (running for 00:00:17.93)
Memory usage on this node: 2.8/12.7 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/2 CPUs, 0/1 GPUs, 0.0/7.3 GiB heap, 0.0/3.65 GiB objects (0.0/1.0 accelerator_type:T4)
Result logdir: /root/ray_results/PPO
Number of trials: 1/1 (1 RUNNING)
Trial name 	status 	loc
PPO_MyEnv_33da2_00000	RUNNING 	172.28.0.2:431


(PPOTrainer pid=431) 2022-06-14 18:10:19,257	WARNING deprecation.py:47 -- DeprecationWarning: `slice` has been deprecated. Use `SampleBatch[start:stop]` instead. This will raise an error in the future!

== Status ==
Current time: 2022-06-14 18:10:22 (running for 00:00:22.97)
Memory usage on this node: 3.0/12.7 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/2 CPUs, 0/1 GPUs, 0.0/7.3 GiB heap, 0.0/3.65 GiB objects (0.0/1.0 accelerator_type:T4)
Result logdir: /root/ray_results/PPO
Number of trials: 1/1 (1 RUNNING)
Trial name 	status 	loc
PPO_MyEnv_33da2_00000	RUNNING 	172.28.0.2:431


== Status ==
Current time: 2022-06-14 18:10:27 (running for 00:00:27.99)
Memory usage on this node: 3.0/12.7 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/2 CPUs, 0/1 GPUs, 0.0/7.3 GiB heap, 0.0/3.65 GiB objects (0.0/1.0 accelerator_type:T4)
Result logdir: /root/ray_results/PPO
Number of trials: 1/1 (1 RUNNING)
Trial name 	status 	loc
PPO_MyEnv_33da2_00000	RUNNING 	172.28.0.2:431


== Status ==
Current time: 2022-06-14 18:10:32 (running for 00:00:33.04)
Memory usage on this node: 3.0/12.7 GiB
Using FIFO scheduling algorithm.
Resources requested: 2.0/2 CPUs, 0/1 GPUs, 0.0/7.3 GiB heap, 0.0/3.65 GiB objects (0.0/1.0 accelerator_type:T4)
Result logdir: /root/ray_results/PPO
Number of trials: 1/1 (1 RUNNING)
Trial name 	status 	loc
PPO_MyEnv_33da2_00000	RUNNING 	172.28.0.2:431


Result for PPO_MyEnv_33da2_00000:
  agent_timesteps_total: 256
  counters:
    num_agent_steps_sampled: 256
    num_agent_steps_trained: 256
    num_env_steps_sampled: 256
    num_env_steps_trained: 256
  custom_metrics: {}
  date: 2022-06-14_18-10-34
  done: true
  episode_len_mean: 10.0
  episode_media: {}
  episode_reward_max: 10.0
  episode_reward_mean: 10.0
  episode_reward_min: 10.0
  episodes_this_iter: 25
  episodes_total: 25
  experiment_id: 7d49449601d54b969a05203fde2f9b11
  hostname: dda9382a7eae
  info:
    learner:
      default_policy:
        custom_metrics: {}
        learner_stats:
          cur_kl_coeff: 0.20000000298023224
          cur_lr: 4.999999873689376e-05
          entropy: 2.761671781539917
          entropy_coeff: 0.0
          kl: 0.015223202295601368
          model: {}
          policy_loss: -0.1377173811197281
          total_loss: 7.826196193695068
          vf_explained_var: -8.736054041946772e-06
          vf_loss: 7.960869312286377
        num_agent_steps_trained: 128.0
    num_agent_steps_sampled: 256
    num_agent_steps_trained: 256
    num_env_steps_sampled: 256
    num_env_steps_trained: 256
  iterations_since_restore: 1
  node_ip: 172.28.0.2
  num_agent_steps_sampled: 256
  num_agent_steps_trained: 256
  num_env_steps_sampled: 256
  num_env_steps_sampled_this_iter: 256
  num_env_steps_trained: 256
  num_env_steps_trained_this_iter: 256
  num_healthy_workers: 1
  off_policy_estimator: {}
  perf:
    cpu_util_percent: 94.98076923076925
    ram_util_percent: 23.4
  pid: 431
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 0.15440618017768118
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 0.4018957976701195
    mean_inference_ms: 7.579902730563272
    mean_raw_obs_processing_ms: 0.4254426473773407
  sampler_results:
    custom_metrics: {}
    episode_len_mean: 10.0
    episode_media: {}
    episode_reward_max: 10.0
    episode_reward_mean: 10.0
    episode_reward_min: 10.0
    episodes_this_iter: 25
    hist_stats:
      episode_lengths:
      - 10
      - 10
      - 10
      - 10
      - 10
      - 10
      - 10
      - 10
      - 10
      - 10
      - 10
      - 10
      - 10
      - 10
      - 10
      - 10
      - 10
      - 10
      - 10
      - 10
      - 10
      - 10
      - 10
      - 10
      - 10
      episode_reward:
      - 10.0
      - 10.0
      - 10.0
      - 10.0
      - 10.0
      - 10.0
      - 10.0
      - 10.0
      - 10.0
      - 10.0
      - 10.0
      - 10.0
      - 10.0
      - 10.0
      - 10.0
      - 10.0
      - 10.0
      - 10.0
      - 10.0
      - 10.0
      - 10.0
      - 10.0
      - 10.0
      - 10.0
      - 10.0
    off_policy_estimator: {}
    policy_reward_max: {}
    policy_reward_mean: {}
    policy_reward_min: {}
    sampler_perf:
      mean_action_processing_ms: 0.15440618017768118
      mean_env_render_ms: 0.0
      mean_env_wait_ms: 0.4018957976701195
      mean_inference_ms: 7.579902730563272
      mean_raw_obs_processing_ms: 0.4254426473773407
  time_since_restore: 17.989404678344727
  time_this_iter_s: 17.989404678344727
  time_total_s: 17.989404678344727
  timers:
    learn_throughput: 16.417
    learn_time_ms: 15593.118
    load_throughput: 984181.324
    load_time_ms: 0.26
    training_iteration_time_ms: 17952.205
    update_time_ms: 71.665
  timestamp: 1655230234
  timesteps_since_restore: 0
  timesteps_total: 256
  training_iteration: 1
  trial_id: 33da2_00000
  warmup_time: 12.848384857177734
  

== Status ==
Current time: 2022-06-14 18:10:35 (running for 00:00:36.03)
Memory usage on this node: 3.1/12.7 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/2 CPUs, 0/1 GPUs, 0.0/7.3 GiB heap, 0.0/3.65 GiB objects (0.0/1.0 accelerator_type:T4)
Result logdir: /root/ray_results/PPO
Number of trials: 1/1 (1 TERMINATED)
Trial name 	status 	loc 	iter	total time (s)	ts	reward	episode_reward_max	episode_reward_min	episode_len_mean
PPO_MyEnv_33da2_00000	TERMINATED	172.28.0.2:431	1	17.9894	256	10	10	10	10


2022-06-14 18:10:35,366	INFO tune.py:748 -- Total run time: 41.70 seconds (35.97 seconds for the tuning loop).

1 Like

It seems that the problem was my env registration, now I set the env in the config without registering it.

This was the problem

def env_creator(env_config):
    return MyEnv(gymconfig)

register_env("my_env", env_creator)
trainer = ppo.APPOTrainer(env="my_env", config=cfg)

I also had installed ray with ray[rllib], so I have reinstalled it with ray[all]

Thank you for helping!

@evo11x,

You are welcome. The issue is env_creator is not using the correct argument to the environment. If you change it to the code below it will run with registered environment name. I updated the Collab to confirm.

def env_creator(env_config):
    return MyEnv(env_config)
2 Likes

I don’t see the difference between env_config and gymconfig. Looking into the initialization of MyEnv, MyEnv(env_config) does nothing with env_config anyway.

1 Like