I am having trouble initializing a single rollout worker using the code below.
rollout_worker = RolloutWorker(
env_creator=lambda _: gym.make("CartPole-v1"),
default_policy_class=ppo.PPOTorchPolicy,
)
I run into this error:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 535, in __init__
self._update_policy_map(policy_dict=self.policy_dict)
File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1746, in _update_policy_map
self._build_policy_map(
File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1857, in _build_policy_map
new_policy = create_policy_for_framework(
File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/utils/policy.py", line 141, in create_policy_for_framework
return policy_class(observation_space, action_space, merged_config)
File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo_torch_policy.py", line 64, in __init__
self._initialize_loss_from_dummy_batch()
File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/policy/policy.py", line 1430, in _initialize_loss_from_dummy_batch
actions, state_outs, extra_outs = self.compute_actions_from_input_dict(
File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 572, in compute_actions_from_input_dict
return self._compute_action_helper(
File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/utils/threading.py", line 24, in wrapper
return func(self, *a, **k)
File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 1308, in _compute_action_helper
actions, logp = self.exploration.get_exploration_action(
TypeError: cannot unpack non-iterable NoneType object
Getting a similar error for: APPOTorchPolicy
Hi @pDalmia and welcome to the forum.
I can reproduce this error. The reason for this error is that the self.exploration
object in the policy has basically only an abstract method implemented (with pass
and produces a None
type. To overcome this error you might provide the RolloutWorker
with a PPOConfig
object in which an exploration configuration is defined.
import gymnasium as gym
from ray.rllib.algorithms.ppo import PPOTorchPolicy, PPOConfig
from ray.rllib.evaluation.rollout_worker import RolloutWorker
rollout_worker = RolloutWorker(
env_creator=lambda _: gym.make("CartPole_v1"),
config=PPOConfig().environment(observation_space=gym.spaces.Box(-float("inf"), float("inf"), (4,)), action_space=gym.spaces.Discrete(2)),
default_policy_class=PPOTorchPolicy,
)
Thanks. I can create a rollout worker that way.
However, I was hoping to make some changes to the loss function of the PPO policy. But I see that the compute_gradients
method still calls the loss function in PPOTorchPolicy
even though the new api stack is included. ( and my modified loss function in the learner class is never called.)
env_name = "CartPole-v1"
env_creator = lambda config: gym.make(env_name)
env = env_creator({})
config = (
PPOConfig()
.experimental(_enable_new_api_stack=True)
.environment(
env=env_name,
observation_space=env.observation_space,
action_space=env.action_space,
)
.training(train_batch_size=32)
.framework("torch")
.rollouts(
num_rollout_workers=1,
create_env_on_local_worker=True,
)
)
rollout_worker = RolloutWorker(
env_creator=env_creator,
config=config,
default_policy_class=PPOTorchPolicy,
)
samples = rollout_worker.sample()
gradients = rollout_worker.compute_gradients(samples)
File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 924, in compute_gradients
tower_outputs = self._multi_gpu_parallel_grad_calc([postprocessed_batch])
File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 1421, in _multi_gpu_parallel_grad_calc
raise last_result[0] from last_result[1]
ValueError: 'NoneType' object is not callable
tracebackTraceback (most recent call last):
File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 1336, in _worker
self.loss(model, self.dist_class, sample_batch)
File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo_torch_policy.py", line 84, in loss
curr_action_dist = dist_class(logits, model)
TypeError: 'NoneType' object is not callable
edit: I’ve changed the PPOLearner directly here. Hence no explicit rl_module settings in the config.