Attribute error when trying to compute actions after training DreamerV3 on Cartpole

  • High: It blocks me to complete my task.

So I train cartpole-v1 with DreamerV3 using tune in Colab with free GPU

from ray import tune
from ray.tune import Tuner
from ray.rllib.algorithms.dreamerv3 import DreamerV3Config
from ray.air import RunConfig, CheckpointConfig

# Define the configuration for PPO training
config = (DreamerV3Config().environment("CartPole-v1").training(model_size="XS",training_ratio=1024,).resources(num_gpus=1, num_gpus_per_learner_worker=1, num_learner_workers=0))

# Define the tuner
tuner = Tuner(
    "DreamerV3",
    run_config=RunConfig(
        stop={"training_iteration": 4000},
        checkpoint_config=CheckpointConfig(checkpoint_at_end=True)
    ),
    param_space=config,
)

# Run the tuner
result = tuner.fit()

It trains in an hour. Then I take a checkpoint

from ray.rllib.algorithms.algorithm import Algorithm
algo = Algorithm.from_checkpoint("/root/ray_results/DreamerV3_2023-10-04_14-17-04/DreamerV3_CartPole-v1_b623a_00000_0_2023-10-04_14-17-12/checkpoint_000000")

And try to compute actions almost exactly as documentation example - Getting Started with RLlib — Ray 2.7.0

# Note: `gymnasium` (not `gym`) will be **the** API supported by RLlib from Ray 2.3 on.
try:
    import gymnasium as gym

    gymnasium = True
except Exception:
    import gym

    gymnasium = False

from ray.rllib.algorithms.dreamerv3.dreamerv3 import DreamerV3Config

env_name = "CartPole-v1"
env = gym.make(env_name)
#algo = PPOConfig().environment(env_name).build()

episode_reward = 0
terminated = truncated = False

if gymnasium:
    obs, info = env.reset()
else:
    obs = env.reset()

while not terminated and not truncated:
    action = algo.compute_single_action(obs)
    if gymnasium:
        obs, reward, terminated, truncated, info = env.step(action)
    else:
        obs, reward, terminated, info = env.step(action)
    episode_reward += reward

But I get an error: AttributeError: ‘DreamerV3EnvRunner’ object has no attribute ‘get_policy’
Is something still not ready with Dreamerv3 or am I doing something wrong? With other algorithms everything works fine.

I am having the same problem - See this PR for updated documentation about how to do it and the issue where this is discussed.

Hello,
Now it’s written at the end of Readme.

Running Action Inference after Training

To run action inference on a DreamerV3 Algorithm object, you can use this simple environment loop script.

Note the slight complexity caused by the fact that DreamerV3 a) uses a recurrent model, b) uses the new RLModule-based API stack (no Policy class), and c) outputs actions in a one-hot fashion for discrete action spaces.