Attribute error when trying to compute actions after training DreamerV3 on Cartpole

  • High: It blocks me to complete my task.

So I train cartpole-v1 with DreamerV3 using tune in Colab with free GPU

from ray import tune
from ray.tune import Tuner
from ray.rllib.algorithms.dreamerv3 import DreamerV3Config
from ray.air import RunConfig, CheckpointConfig

# Define the configuration for PPO training
config = (DreamerV3Config().environment("CartPole-v1").training(model_size="XS",training_ratio=1024,).resources(num_gpus=1, num_gpus_per_learner_worker=1, num_learner_workers=0))

# Define the tuner
tuner = Tuner(
    "DreamerV3",
    run_config=RunConfig(
        stop={"training_iteration": 4000},
        checkpoint_config=CheckpointConfig(checkpoint_at_end=True)
    ),
    param_space=config,
)

# Run the tuner
result = tuner.fit()

It trains in an hour. Then I take a checkpoint

from ray.rllib.algorithms.algorithm import Algorithm
algo = Algorithm.from_checkpoint("/root/ray_results/DreamerV3_2023-10-04_14-17-04/DreamerV3_CartPole-v1_b623a_00000_0_2023-10-04_14-17-12/checkpoint_000000")

And try to compute actions almost exactly as documentation example - Getting Started with RLlib — Ray 2.7.0

# Note: `gymnasium` (not `gym`) will be **the** API supported by RLlib from Ray 2.3 on.
try:
    import gymnasium as gym

    gymnasium = True
except Exception:
    import gym

    gymnasium = False

from ray.rllib.algorithms.dreamerv3.dreamerv3 import DreamerV3Config

env_name = "CartPole-v1"
env = gym.make(env_name)
#algo = PPOConfig().environment(env_name).build()

episode_reward = 0
terminated = truncated = False

if gymnasium:
    obs, info = env.reset()
else:
    obs = env.reset()

while not terminated and not truncated:
    action = algo.compute_single_action(obs)
    if gymnasium:
        obs, reward, terminated, truncated, info = env.step(action)
    else:
        obs, reward, terminated, info = env.step(action)
    episode_reward += reward

But I get an error: AttributeError: ‘DreamerV3EnvRunner’ object has no attribute ‘get_policy’
Is something still not ready with Dreamerv3 or am I doing something wrong? With other algorithms everything works fine.

I am having the same problem - See this PR for updated documentation about how to do it and the issue where this is discussed.