- High: It blocks me to complete my task.
So I train cartpole-v1 with DreamerV3 using tune in Colab with free GPU
from ray import tune
from ray.tune import Tuner
from ray.rllib.algorithms.dreamerv3 import DreamerV3Config
from ray.air import RunConfig, CheckpointConfig
# Define the configuration for PPO training
config = (DreamerV3Config().environment("CartPole-v1").training(model_size="XS",training_ratio=1024,).resources(num_gpus=1, num_gpus_per_learner_worker=1, num_learner_workers=0))
# Define the tuner
tuner = Tuner(
"DreamerV3",
run_config=RunConfig(
stop={"training_iteration": 4000},
checkpoint_config=CheckpointConfig(checkpoint_at_end=True)
),
param_space=config,
)
# Run the tuner
result = tuner.fit()
It trains in an hour. Then I take a checkpoint
from ray.rllib.algorithms.algorithm import Algorithm
algo = Algorithm.from_checkpoint("/root/ray_results/DreamerV3_2023-10-04_14-17-04/DreamerV3_CartPole-v1_b623a_00000_0_2023-10-04_14-17-12/checkpoint_000000")
And try to compute actions almost exactly as documentation example - Getting Started with RLlib — Ray 2.7.0
# Note: `gymnasium` (not `gym`) will be **the** API supported by RLlib from Ray 2.3 on.
try:
import gymnasium as gym
gymnasium = True
except Exception:
import gym
gymnasium = False
from ray.rllib.algorithms.dreamerv3.dreamerv3 import DreamerV3Config
env_name = "CartPole-v1"
env = gym.make(env_name)
#algo = PPOConfig().environment(env_name).build()
episode_reward = 0
terminated = truncated = False
if gymnasium:
obs, info = env.reset()
else:
obs = env.reset()
while not terminated and not truncated:
action = algo.compute_single_action(obs)
if gymnasium:
obs, reward, terminated, truncated, info = env.step(action)
else:
obs, reward, terminated, info = env.step(action)
episode_reward += reward
But I get an error: AttributeError: ‘DreamerV3EnvRunner’ object has no attribute ‘get_policy’
Is something still not ready with Dreamerv3 or am I doing something wrong? With other algorithms everything works fine.