Hi, I have made this work a year or two back (different code base) but I have no idea how. I am starting a new bare bones version from scratch, and I’m not sure why it gives me this stack:
Relevant INFO logs from RLLib:
2023-12-19 20:08:25,103 INFO policy.py:1287 -- Policy (worker=local) running on 1 GPUs.
2023-12-19 20:08:25,103 INFO tf_policy.py:171 -- Found 2 visible cuda devices.
2023-12-19 20:08:25,163 INFO dynamic_tf_policy_v2.py:710 -- Adding extra-action-fetch `action_prob` to view-reqs.
2023-12-19 20:08:25,164 INFO dynamic_tf_policy_v2.py:710 -- Adding extra-action-fetch `action_logp` to view-reqs.
2023-12-19 20:08:25,165 INFO dynamic_tf_policy_v2.py:710 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
2023-12-19 20:08:25,166 INFO dynamic_tf_policy_v2.py:710 -- Adding extra-action-fetch `vf_preds` to view-reqs.
2023-12-19 20:08:25,166 INFO dynamic_tf_policy_v2.py:722 -- Testing `postprocess_trajectory` w/ dummy batch.
2023-12-19 20:08:26,625 INFO policy.py:1287 -- Policy (worker=local) running on 1 GPUs.
2023-12-19 20:08:26,625 INFO tf_policy.py:171 -- Found 2 visible cuda devices.
2023-12-19 20:08:27,858 INFO util.py:118 -- Using connectors:
2023-12-19 20:08:27,858 INFO util.py:119 -- AgentConnectorPipeline
ObsPreprocessorConnector
StateBufferConnector
ViewRequirementAgentConnector
2023-12-19 20:08:27,858 INFO util.py:120 -- ActionConnectorPipeline
ConvertToNumpyConnector
NormalizeActionsConnector
ImmutableActionsConnector
2023-12-19 20:08:27,859 INFO rollout_worker.py:2000 -- Built policy map: <PolicyMap lru-caching-capacity=100 policy-IDs=['my_ppo']>
2023-12-19 20:08:27,859 INFO rollout_worker.py:2001 -- Built preprocessor map: {'my_ppo': None}
2023-12-19 20:08:27,859 INFO rollout_worker.py:761 -- Built filter map: defaultdict(<class 'ray.rllib.utils.filter.NoFilter'>, {})
Relevant error part:
2023-12-19 20:08:27,978 ERROR actor_manager.py:508 -- Ray error, taking actor 1 out of service. ray::RolloutWorker.apply() (pid=32205, ip=192.168.0.42, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f14cb3dba20>)
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/utils/actor_manager.py", line 185, in apply
raise e
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/utils/actor_manager.py", line 176, in apply
return func(self, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/execution/rollout_ops.py", line 86, in <lambda>
lambda w: w.sample(), local_worker=False, healthy_only=True
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/rollout_worker.py", line 915, in sample
batches = [self.input_reader.next()]
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/sampler.py", line 92, in next
batches = [self.get_data()]
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/sampler.py", line 277, in get_data
item = next(self._env_runner)
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/env_runner_v2.py", line 323, in run
outputs = self.step()
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/env_runner_v2.py", line 354, in step
infos=infos,
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/env_runner_v2.py", line 515, in _process_observations
policy_id: PolicyID = episode.policy_for(agent_id)
File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/episode_v2.py", line 123, in policy_for
worker=self.worker,
TypeError: <lambda>() got an unexpected keyword argument 'worker'
Relevant config from training.py file:
config = PPOConfig()\
.python_environment()\
.resources(
num_gpus=1,
# num_cpus_per_worker=1,
# num_gpus_per_worker=0,
)\
.framework(
framework='tf',
eager_tracing=False,
)\
.environment(
env='marlenv',
env_config=exp_config,
# observation_space=None,
# action_space=None,
# clip_rewards=None,
# normalize_actions=True, # default is True, the value you used was False
# clip_actions=False,
# disable_env_checking=True,
)\
.rollouts(
num_rollout_workers = 1,
num_envs_per_worker = 1,
# rollout_fragment_length = 400,
# batch_mode = 'complete_episodes',
# observation_filter = 'NoFilter',
)\
.training(
# gamma=0.99,
# lr=5e-05,
train_batch_size=4000,
# model=model,
# lr_schedule=None,
# use_critic=True,
# use_gae=True,
# lambda_=1.0,
# kl_coeff=0.2,
# sgd_minibatch_size=128,
# num_sgd_iter=30,
# shuffle_sequences=True,
# vf_loss_coeff=1.0,
# entropy_coeff=0.0,
# entropy_coeff_schedule=None,
# clip_param=0.3,
# vf_clip_param=10,
# grad_clip=None,
# kl_target=0.01,
)\
.exploration(
# explore=True,
# exploration_config={'type': CustomExploration},
# exploration_config={'type': 'StochasticSampling'}
)\
.multi_agent(
policies = policies,
# policy_map_capacity = 100,
policy_mapping_fn = lambda agent_id: 'my_ppo',
# policies_to_train = ['my_ppo'],
# observation_fn = None,
# count_steps_by = 'env_steps',
)\
.offline_data(
# postprocess_inputs=False,
)\
.evaluation(
# evaluation_interval = 10,
# evaluation_duration = 10,
# evaluation_duration_unit = 'episodes',
# evaluation_parallel_to_training = False,
# evaluation_config = {
# 'explore': True,
# 'exploration_config' : {'type': CustomExploration}
# 'exploration_config' : {'type': 'StochasticSampling'}
# },
# evaluation_num_workers = 1,
# always_attach_evaluation_results = True,
# evaluation_sample_timeout_s=7200,
)\
.reporting(
# keep_per_episode_custom_metrics = True, # default is False
# metrics_episode_collection_timeout_s = 60.0,
# metrics_num_episodes_for_smoothing = 100,
# min_time_s_per_iteration = 300,
# min_train_timesteps_per_iteration = 0,
# min_sample_timesteps_per_iteration = 0,
)\
.debugging(
log_level='INFO',
# seed=42
)
I am slowly adding things from the entire config doc and I have reached a point where even all of those uncommented values produces the same error. Trying to create the most barebones version for my custom environment. I don’t even understand what the error is related to though. Can anyone please help?