Ray 2.4.0 (RLLib) Completely lost with documentation

Hi, I have made this work a year or two back (different code base) but I have no idea how. I am starting a new bare bones version from scratch, and I’m not sure why it gives me this stack:

Relevant INFO logs from RLLib:

2023-12-19 20:08:25,103	INFO policy.py:1287 -- Policy (worker=local) running on 1 GPUs.
2023-12-19 20:08:25,103	INFO tf_policy.py:171 -- Found 2 visible cuda devices.
2023-12-19 20:08:25,163	INFO dynamic_tf_policy_v2.py:710 -- Adding extra-action-fetch `action_prob` to view-reqs.
2023-12-19 20:08:25,164	INFO dynamic_tf_policy_v2.py:710 -- Adding extra-action-fetch `action_logp` to view-reqs.
2023-12-19 20:08:25,165	INFO dynamic_tf_policy_v2.py:710 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
2023-12-19 20:08:25,166	INFO dynamic_tf_policy_v2.py:710 -- Adding extra-action-fetch `vf_preds` to view-reqs.
2023-12-19 20:08:25,166	INFO dynamic_tf_policy_v2.py:722 -- Testing `postprocess_trajectory` w/ dummy batch.
2023-12-19 20:08:26,625	INFO policy.py:1287 -- Policy (worker=local) running on 1 GPUs.
2023-12-19 20:08:26,625	INFO tf_policy.py:171 -- Found 2 visible cuda devices.
2023-12-19 20:08:27,858	INFO util.py:118 -- Using connectors:
2023-12-19 20:08:27,858	INFO util.py:119 --     AgentConnectorPipeline
        ObsPreprocessorConnector
        StateBufferConnector
        ViewRequirementAgentConnector
2023-12-19 20:08:27,858	INFO util.py:120 --     ActionConnectorPipeline
        ConvertToNumpyConnector
        NormalizeActionsConnector
        ImmutableActionsConnector
2023-12-19 20:08:27,859	INFO rollout_worker.py:2000 -- Built policy map: <PolicyMap lru-caching-capacity=100 policy-IDs=['my_ppo']>
2023-12-19 20:08:27,859	INFO rollout_worker.py:2001 -- Built preprocessor map: {'my_ppo': None}
2023-12-19 20:08:27,859	INFO rollout_worker.py:761 -- Built filter map: defaultdict(<class 'ray.rllib.utils.filter.NoFilter'>, {})

Relevant error part:

2023-12-19 20:08:27,978	ERROR actor_manager.py:508 -- Ray error, taking actor 1 out of service. ray::RolloutWorker.apply() (pid=32205, ip=192.168.0.42, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f14cb3dba20>)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/utils/actor_manager.py", line 185, in apply
    raise e
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/utils/actor_manager.py", line 176, in apply
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/execution/rollout_ops.py", line 86, in <lambda>
    lambda w: w.sample(), local_worker=False, healthy_only=True
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/rollout_worker.py", line 915, in sample
    batches = [self.input_reader.next()]
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/sampler.py", line 92, in next
    batches = [self.get_data()]
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/sampler.py", line 277, in get_data
    item = next(self._env_runner)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/env_runner_v2.py", line 323, in run
    outputs = self.step()
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/env_runner_v2.py", line 354, in step
    infos=infos,
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/env_runner_v2.py", line 515, in _process_observations
    policy_id: PolicyID = episode.policy_for(agent_id)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/episode_v2.py", line 123, in policy_for
    worker=self.worker,
TypeError: <lambda>() got an unexpected keyword argument 'worker'

Relevant config from training.py file:

config = PPOConfig()\
        .python_environment()\
        .resources(
            num_gpus=1,
            # num_cpus_per_worker=1,
            # num_gpus_per_worker=0,
        )\
        .framework(
            framework='tf',
            eager_tracing=False,
        )\
        .environment(
            env='marlenv',
            env_config=exp_config,
            # observation_space=None,
            # action_space=None,
            # clip_rewards=None,
            # normalize_actions=True, # default is True, the value you used was False
            # clip_actions=False,
            # disable_env_checking=True,
        )\
        .rollouts(
            num_rollout_workers = 1,
            num_envs_per_worker = 1,
            # rollout_fragment_length = 400,
            # batch_mode = 'complete_episodes',
            # observation_filter = 'NoFilter',
        )\
        .training(
            # gamma=0.99,
            # lr=5e-05,
            train_batch_size=4000,
            # model=model,
            # lr_schedule=None,
            # use_critic=True,
            # use_gae=True,
            # lambda_=1.0,
            # kl_coeff=0.2,
            # sgd_minibatch_size=128,
            # num_sgd_iter=30,
            # shuffle_sequences=True,
            # vf_loss_coeff=1.0,
            # entropy_coeff=0.0,
            # entropy_coeff_schedule=None,
            # clip_param=0.3,
            # vf_clip_param=10,
            # grad_clip=None,
            # kl_target=0.01,
        )\
        .exploration(
            # explore=True,
            # exploration_config={'type': CustomExploration},
            # exploration_config={'type': 'StochasticSampling'}
        )\
        .multi_agent(
            policies = policies,
            # policy_map_capacity = 100,
            policy_mapping_fn = lambda agent_id: 'my_ppo',
            # policies_to_train = ['my_ppo'],
            # observation_fn = None,
            # count_steps_by = 'env_steps',
        )\
        .offline_data(
            # postprocess_inputs=False,
        )\
        .evaluation(
            # evaluation_interval = 10,
            # evaluation_duration = 10,
            # evaluation_duration_unit = 'episodes',
            # evaluation_parallel_to_training = False,
            # evaluation_config = {
            #    'explore': True,
            #    'exploration_config' : {'type': CustomExploration}
            #    'exploration_config' : {'type': 'StochasticSampling'}
            # },
            # evaluation_num_workers = 1,
            # always_attach_evaluation_results = True,
            # evaluation_sample_timeout_s=7200,
        )\
        .reporting(
            # keep_per_episode_custom_metrics = True, # default is False
            # metrics_episode_collection_timeout_s = 60.0,
            # metrics_num_episodes_for_smoothing = 100,
            # min_time_s_per_iteration = 300,
            # min_train_timesteps_per_iteration = 0,
            # min_sample_timesteps_per_iteration = 0,
        )\
        .debugging(
            log_level='INFO',
            # seed=42
        )

I am slowly adding things from the entire config doc and I have reached a point where even all of those uncommented values produces the same error. Trying to create the most barebones version for my custom environment. I don’t even understand what the error is related to though. Can anyone please help?

Okay got it.

The line:

policy_mapping_fn = lambda agent_id: 'my_ppo',

has to be changed to

policy_mapping_fn = lambda agent_id, episode, worker, **kwargs: 'my_ppo',

for some reason. Had to go through the source code to get an idea of what parameter might have been the problem focus. And then happened to stumble upon this and accidentally see an example code block of how it should be written (because the documentation isn’t solid on this I guess?): Environments — Ray 3.0.0.dev0

1 Like