RLLib Rollout Worker Init

I am having trouble initializing a single rollout worker using the code below.

rollout_worker = RolloutWorker(
    env_creator=lambda _: gym.make("CartPole-v1"),
    default_policy_class=ppo.PPOTorchPolicy,
)

I run into this error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 535, in __init__
    self._update_policy_map(policy_dict=self.policy_dict)
  File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1746, in _update_policy_map
    self._build_policy_map(
  File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1857, in _build_policy_map
    new_policy = create_policy_for_framework(
  File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/utils/policy.py", line 141, in create_policy_for_framework
    return policy_class(observation_space, action_space, merged_config)
  File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo_torch_policy.py", line 64, in __init__
    self._initialize_loss_from_dummy_batch()
  File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/policy/policy.py", line 1430, in _initialize_loss_from_dummy_batch
    actions, state_outs, extra_outs = self.compute_actions_from_input_dict(
  File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 572, in compute_actions_from_input_dict
    return self._compute_action_helper(
  File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/utils/threading.py", line 24, in wrapper
    return func(self, *a, **k)
  File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 1308, in _compute_action_helper
    actions, logp = self.exploration.get_exploration_action(
TypeError: cannot unpack non-iterable NoneType object

Getting a similar error for: APPOTorchPolicy

Hi @pDalmia and welcome to the forum.

I can reproduce this error. The reason for this error is that the self.exploration object in the policy has basically only an abstract method implemented (with pass and produces a None type. To overcome this error you might provide the RolloutWorker with a PPOConfig object in which an exploration configuration is defined.

import gymnasium as gym

from ray.rllib.algorithms.ppo import PPOTorchPolicy, PPOConfig
from ray.rllib.evaluation.rollout_worker import RolloutWorker

rollout_worker = RolloutWorker(
    env_creator=lambda _: gym.make("CartPole_v1"),
    config=PPOConfig().environment(observation_space=gym.spaces.Box(-float("inf"), float("inf"), (4,)), action_space=gym.spaces.Discrete(2)),    
    default_policy_class=PPOTorchPolicy,
)

Thanks. I can create a rollout worker that way.

However, I was hoping to make some changes to the loss function of the PPO policy. But I see that the compute_gradients method still calls the loss function in PPOTorchPolicy even though the new api stack is included. ( and my modified loss function in the learner class is never called.)

    env_name = "CartPole-v1"
    env_creator = lambda config: gym.make(env_name)
    env = env_creator({})
    config = (
        PPOConfig()
        .experimental(_enable_new_api_stack=True)
        .environment(
            env=env_name,
            observation_space=env.observation_space,
            action_space=env.action_space,
        )
        .training(train_batch_size=32)
        .framework("torch")
        .rollouts(
            num_rollout_workers=1,
            create_env_on_local_worker=True,
        )
    )
    rollout_worker = RolloutWorker(
        env_creator=env_creator,
        config=config,
        default_policy_class=PPOTorchPolicy,
    )
    samples = rollout_worker.sample()
    gradients = rollout_worker.compute_gradients(samples) 
  File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 924, in compute_gradients
    tower_outputs = self._multi_gpu_parallel_grad_calc([postprocessed_batch])
  File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 1421, in _multi_gpu_parallel_grad_calc
    raise last_result[0] from last_result[1]
ValueError: 'NoneType' object is not callable
 tracebackTraceback (most recent call last):
  File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 1336, in _worker
    self.loss(model, self.dist_class, sample_batch)
  File "/home/priyam/projects/predator-prey/.venv/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo_torch_policy.py", line 84, in loss
    curr_action_dist = dist_class(logits, model)
TypeError: 'NoneType' object is not callable

edit: I’ve changed the PPOLearner directly here. Hence no explicit rl_module settings in the config.