I have a problem setting the truncated and terminated values. I have implemented to functions, _computeTruncated and _computeTerminated, to check if an agent reached the truncated or terminated state respectively. In the implementation below, if an agent crossed the boundary of the given region (which is checked in the function is_out_of_bounds), or if the time limit is exceeded, the respective truncated value is set to True, otherwise False. The terminated value is set to true, only if the agent has reached its target position.
The experiment fails and the following error is thrown:
ValueError: ('Batches sent to postprocessing must only contain steps from a single trajectory.', SampleBatch(2000: ['obs', 'new_obs', 'actions', 'prev_actions', 'rewards', 'prev_rewards', 'terminateds', 'truncateds', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't', 'vf_preds', 'action_dist_inputs', 'action_prob', 'action_logp']))
2023-09-25 11:49:13,160 WARNING tune.py:1122 -- Trial Runner checkpointing failed: Sync process failed: GetFileInfo() yielded path 'C:/Users/sAz/ray_results/PPO', which is outside base dir 'C:\Users\sAz\ray_results\PPO' (this is only a part of the message).
The implementation is given below.
def _computeTruncated(self):
all_val = False
truncated = {i: False for i in range(self.NUM_DRONES)}
for q in range(self.NUM_DRONES):
pos_q = self.get_quad_pos(q)[0:3]
truncated[q] = self.is_out_of_bounds(pos_q) or (self.step_counter > self.EPISODE_LEN_STEP)
all_val = all_val and truncated[q]
truncated["__all__"] = all_val
return truncated
def _computeTerminated(self):
arrived_dist = .33
bool_val = False
done = {i: bool_val for i in range(self.NUM_DRONES)}
all_val = True if bool_val is True else False
for q in range(self.NUM_DRONES):
pos_q = self.get_quad_pos(q)[0:3]
targ_q = self.get_target_position(q)
dist_q = np.linalg.norm(pos_q-targ_q)
done[q] = (dist_q <= arrived_dist)
all_val = all_val and done[q]
done["__all__"] = all_val
return done
In another implementation (given below) I set the terminated value to true if the time limit is exceeded and the truncated value is always false. This implementation runs.
def _computeTruncated(self):
all_val = False
truncated = {i: False for i in range(self.NUM_DRONES)}
truncated["__all__"] = all_val
return truncated
def _computeTerminated(self):
bool_val = False or (self.step_counter > self.EPISODE_LEN_STEP)
done = {i: bool_val for i in range(self.NUM_DRONES)}
all_val = True if bool_val is True else False
done["__all__"] = all_val
return done