AttributeError: 'APPO' object has no attribute '_warmup_time'

elliottower · February 27, 2023, 5:40am

I made an issue on GitHub but posting here in case someone has any idea how to figure this out.

[RLlib] AttributeError: 'PPO' object has no attribute '_warmup_time' (action_masking.py, pettingzoo environment)

opened 09:23PM - 26 Feb 23 UTC

bug triage

### What happened + What you expected to happen I'm trying to run the action_…masking.py file for my custom environment and I keep running into this error for both PPO and APPO, but when I run it with the toy custom environment specified in the example it works fine. I have been able to run ray with my own custom environments by modifying the self_play_with_open_spiel file so I don't think it's an environment issue. ``` File "/Users/elliottower/Library/Caches/pypoetry/virtualenvs/cathedral-rl-6lsEyOq--py3.9/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 2335, in get_auto_filled_metrics auto_filled = super().get_auto_filled_metrics( File "/Users/elliottower/Library/Caches/pypoetry/virtualenvs/cathedral-rl-6lsEyOq--py3.9/lib/python3.9/site-packages/ray/tune/trainable/trainable.py", line 267, in get_auto_filled_metrics "warmup_time": self._warmup_time, AttributeError: 'PPO' object has no attribute '_warmup_time' ``` The only thing changed from my code and the action_masking.py file is I added lines right after ray.init() which define my environment, and specifying the action_space and observation_space explicitly (when I debug I can see that env.action_space is a discrete gymnasium space and observation space is a dict of observation and action mask). ``` def env_creator(args): return PettingZooEnv(cathedral_v0.env()) env = env_creator({}) register_env("cathedral", env_creator) # main part: configure the ActionMaskEnv and ActionMaskModel config = ( ppo.PPOConfig() .environment( env="cathedral", action_space=env.action_space, observation_space=env.observation_space, env_config={ "action_space": env.action_space, "observation_space": env.observation_space, }, ) ``` When I try running not in local mode I get the error that the rollout worker is unhealthy (another bug which I've been getting a lot and don't know how to fix, any help with that would be greatly appreciated as well). I found a potential cause for this being that my observation space has the keys 'action_mask' and 'observation' from pettingzoo, but the action mask models both look for the key 'observations' My run script depends on a custom repo I'm working on, but I'll put the script here anyways. The script is run from cathedral-rl/examples which needs to be cloned from https://github.com/elliottower/cathedral-rl ### Versions / Dependencies python = ">=3.8, <3.12" PettingZoo = "^1.22.3" gymnasium = "^0.26.3" pygame = "^2.1.3" SuperSuit = "^3.7.1" poetry = "^1.3.2" ray = {extras = ["rllib"], version = "^2.3.0"} numpy = "^1.24.2" pandas = "^1.5.3" pymunk = "^6.4.0" tensorflow-probability = "^0.19.0" protobuf = "3.19.6" torch = "^1.13.1" tensorflow = "^2.11.0" ### Reproduction script ``` """Example showing how to use "action masking" in RLlib. "Action masking" allows the agent to select actions based on the current observation. This is useful in many practical scenarios, where different actions are available in different time steps. Blog post explaining action masking: https://boring-guy.sh/posts/masking-rl/ RLlib supports action masking, i.e., disallowing these actions based on the observation, by slightly adjusting the environment and the model as shown in this example. Here, the ActionMaskEnv wraps an underlying environment (here, RandomEnv), defining only a subset of all actions as valid based on the environment's observations. If an invalid action is selected, the environment raises an error - this must not happen! The environment constructs Dict observations, where obs["observations"] holds the original observations and obs["action_mask"] holds the valid actions. To avoid selection invalid actions, the ActionMaskModel is used. This model takes the original observations, computes the logits of the corresponding actions and then sets the logits of all invalid actions to zero, thus disabling them. This only works with discrete actions. --- Run this example with defaults (using Tune and action masking): $ python action_masking.py Then run again without action masking, which will likely lead to errors due to invalid actions being selected (ValueError "Invalid action sent to env!"): $ python action_masking.py --no-masking Other options for running this example: $ python action_masking.py --help """ import argparse import os from gymnasium.spaces import Box, Discrete import ray from ray import air, tune from ray.rllib.algorithms import ppo from ray.rllib.env import PettingZooEnv from ray.rllib.examples.env.action_mask_env import ActionMaskEnv from ray.rllib.examples.models.action_mask_model import ( ActionMaskModel, TorchActionMaskModel, ) from ray.tune import register_env from ray.tune.logger import pretty_print from cathedral_rl import cathedral_v0 def get_cli_args(): """Create CLI parser and return parsed arguments""" parser = argparse.ArgumentParser() # example-specific args parser.add_argument( "--no-masking", action="store_true", help="Do NOT mask invalid actions. This will likely lead to errors.", ) # general args parser.add_argument( "--run", type=str, default="APPO", help="The RLlib-registered algorithm to use." ) parser.add_argument("--num-cpus", type=int, default=0) parser.add_argument( "--framework", choices=["tf", "tf2", "torch"], default="tf", help="The DL framework specifier.", ) parser.add_argument("--eager-tracing", action="store_true") parser.add_argument( "--stop-iters", type=int, default=10, help="Number of iterations to train." ) parser.add_argument( "--stop-timesteps", type=int, default=10000, help="Number of timesteps to train.", ) parser.add_argument( "--stop-reward", type=float, default=80.0, help="Reward at which we stop training.", ) parser.add_argument( "--no-tune", action="store_true", help="Run without Tune using a manual train loop instead. Here," "there is no TensorBoard support.", ) parser.add_argument( "--local-mode", action="store_true", help="Init Ray in local mode for easier debugging.", ) args = parser.parse_args() print(f"Running with following CLI args: {args}") return args if __name__ == "__main__": args = get_cli_args() ray.init(num_cpus=args.num_cpus or None, local_mode=args.local_mode) def env_creator(args): return PettingZooEnv(cathedral_v0.env()) env = env_creator({}) register_env("cathedral", env_creator) # main part: configure the ActionMaskEnv and ActionMaskModel config = ( ppo.PPOConfig() .environment( env="cathedral", action_space=env.action_space, observation_space=env.observation_space, env_config={ "action_space": env.action_space, "observation_space": env.observation_space, }, ) .training( # the ActionMaskModel retrieves the invalid actions and avoids them model={ "custom_model": ActionMaskModel if args.framework != "torch" else TorchActionMaskModel, # disable action masking according to CLI "custom_model_config": {"no_masking": args.no_masking}, }, ) .framework(args.framework, eager_tracing=args.eager_tracing) .resources( # Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0. num_gpus=int(os.environ.get("RLLIB_NUM_GPUS", "0")) ) ) stop = { "training_iteration": args.stop_iters, "timesteps_total": args.stop_timesteps, "episode_reward_mean": args.stop_reward, } # manual training loop (no Ray tune) if args.no_tune: if args.run not in {"APPO", "PPO"}: raise ValueError("This example only supports APPO and PPO.") algo = config.build() # run manual training loop and print results after each iteration for _ in range(args.stop_iters): result = algo.train() print(pretty_print(result)) # stop training if the target train steps or reward are reached if ( result["timesteps_total"] >= args.stop_timesteps or result["episode_reward_mean"] >= args.stop_reward ): break # manual test loop print("Finished training. Running manual test/inference loop.") # prepare environment with max 10 steps config["env_config"]["max_episode_len"] = 10 env = ActionMaskEnv(config["env_config"]) obs, info = env.reset() done = False # run one iteration until done print(f"ActionMaskEnv with {config['env_config']}") while not done: action = algo.compute_single_action(obs) next_obs, reward, done, truncated, _ = env.step(action) # observations contain original observations and the action mask # reward is random and irrelevant here and therefore not printed print(f"Obs: {obs}, Action: {action}") obs = next_obs # run with tune for auto trainer creation, stopping, TensorBoard, etc. else: tuner = tune.Tuner( args.run, param_space=config.to_dict(), run_config=air.RunConfig(stop=stop, verbose=2), ) tuner.fit() print("Finished successfully without selecting invalid actions.") ray.shutdown() ``` ### Issue Severity High: It blocks me from completing my task.

arturn · March 22, 2023, 1:44am

Hi @elliottower ,

Thanks for raising this issue.
I’ve replied and closed the issue. Feel free to reopen it if you think that there is more to discuss related to this issue.

Topic		Replies	Views
[RLlib] Assertion error on connect four with action masking RLlib	5	454	June 20, 2023
Action mask works in Petting Zoo tests but does not while training with rllib RLlib	2	111	June 10, 2024
How tu use PPO agent with env with masked actions? RLlib	3	1519	May 3, 2022
Problem with action masking RLlib	7	2223	May 19, 2022
Official action mask PPO example can't work RLlib	0	47	April 13, 2025

AttributeError: 'APPO' object has no attribute '_warmup_time'

Related topics