Problem using truncated and terminated

  • High: It blocks me to complete my task.

In a multi agent environment I implemented two functions to check if the terminated or truncated state is reached. If I am not wrong, the truncated should be true if the time limit is exceeded. When I run the code I get an error.

The following is a part of the error message:

ValueError: ('Batches sent to postprocessing must only contain steps from a single trajectory.', SampleBatch(2000: ['obs', 'new_obs', 'actions', 'prev_actions', 'rewards', 'prev_rewards', 'terminateds', 'truncateds', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't', 'vf_preds', 'action_dist_inputs', 'action_prob', 'action_logp']))2023-09-25 22:01:19,239	ERROR tune.py:1139 -- Trials did not complete: [PPO_CustomRl3_45c2c_00000]

The code is shown below:

def _computeTruncated(self):
    all_val = True
    truncated = {i: False for i in range(self.NUM_DRONES)}

    for q in range(self.NUM_DRONES):
        truncated[q] =  (self.step_counter > self.EPISODE_LEN_STEP)
        all_val      = all_val and truncated[q]

    truncated["__all__"] = all_val
    return truncated


def _computeTerminated(self):
    bool_val = False 
    done = {i: bool_val for i in range(self.NUM_DRONES)}
    all_val = True if bool_val is True else False
  
  
    done["__all__"] = all_val
    return done

Hi @sAz-G,

I am not sure if this is the problem because you did not provide the full error but one invariant the code enforces is that an agent id may not be in any of the dictionaries step returns if a previous step has it terminated or truncated.

I have not checked after the rl modules rewrite
but this used to be the case and I would expect it might still be true.

Here is the full error message

Failure # 1 (occurred at 2023-09-25_22-01-19)
e[36mray::PPO.train()e[39m (pid=17388, ip=127.0.0.1, actor_id=f0ed7415c5fb370c5e5eda2101000000, repr=PPO)
  File "python\ray\_raylet.pyx", line 1616, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 1556, in ray._raylet.execute_task.function_executor
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\_private\function_manager.py", line 726, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\tune\trainable\trainable.py", line 400, in train
    raise skipped from exception_cause(skipped)
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\tune\trainable\trainable.py", line 397, in train
    result = self.step()
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 853, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 2838, in _run_one_training_iteration
    results = self.training_step()
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\rllib\algorithms\ppo\ppo.py", line 429, in training_step
    train_batch = synchronous_parallel_sample(
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\rllib\execution\rollout_ops.py", line 85, in synchronous_parallel_sample
    sample_batches = worker_set.foreach_worker(
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 680, in foreach_worker
    handle_remote_call_result_errors(remote_results, self._ignore_worker_failures)
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 76, in handle_remote_call_result_errors
    raise r.get()
ray.exceptions.RayTaskError(ValueError): e[36mray::RolloutWorker.apply()e[39m (pid=16772, ip=127.0.0.1, actor_id=6a89e20468fb4de004ea714701000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x0000013F0FFE89A0>)
  File "python\ray\_raylet.pyx", line 1616, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 1556, in ray._raylet.execute_task.function_executor
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\_private\function_manager.py", line 726, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\rllib\utils\actor_manager.py", line 185, in apply
    raise e
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\rllib\utils\actor_manager.py", line 176, in apply
    return func(self, *args, **kwargs)
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\rllib\execution\rollout_ops.py", line 86, in <lambda>
    lambda w: w.sample(), local_worker=False, healthy_only=True
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 696, in sample
    batches = [self.input_reader.next()]
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\rllib\evaluation\sampler.py", line 92, in next
    batches = [self.get_data()]
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\rllib\evaluation\sampler.py", line 277, in get_data
    item = next(self._env_runner)
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\rllib\evaluation\env_runner_v2.py", line 344, in run
    outputs = self.step()
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\rllib\evaluation\env_runner_v2.py", line 370, in step
    active_envs, to_eval, outputs = self._process_observations(
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\rllib\evaluation\env_runner_v2.py", line 699, in _process_observations
    sample_batch = self._try_build_truncated_episode_multi_agent_batch(
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\rllib\evaluation\env_runner_v2.py", line 1000, in _try_build_truncated_episode_multi_agent_batch
    episode.postprocess_episode(batch_builder=batch_builder, is_done=False)
  File "C:\Users\sAz\anaconda3\envs\drones_2\lib\site-packages\ray\rllib\evaluation\episode_v2.py", line 303, in postprocess_episode
    raise ValueError(
ValueError: ('Batches sent to postprocessing must only contain steps from a single trajectory.', SampleBatch(2000: ['obs', 'new_obs', 'actions', 'prev_actions', 'rewards', 'prev_rewards', 'terminateds', 'truncateds', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't', 'vf_preds', 'action_dist_inputs', 'action_prob', 'action_logp'])).

one invariant the code enforces is that an agent id may not be in any of the dictionaries step returns if a previous step has it terminated or truncated.

Does it mean that I have to check the dicitionaries in this case?

I checked again and an agent id is not contained in the dictionary if it is in terminated or truncated state.

I am still not sure what to do in this case. Does it make sense to insert the id with random information? Or maybe I should insert the last information manually?