How RLlib distinguish terminated/truncated situation in Server-client configuration?

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I am following cartpole server-client example to build my task. in this client side, related key commands are as follows:
client = PolicyClient()
eid = client.start_episode()
for loop:
action = client.get_action(eid, obs)
obs, reward, terminated, truncated, info = env.step(action)
client.log_returns(eid, reward, info=info)
if terminated or truncated:
client.end_episode(eid, obs)
It seems terminated/truncated values returned by env.step never passed to client api, how server side know if the episode end by truncated or terminated?
I think the principle for terminated/truncated is quite different.
for q learning type of algorithm, for terminated: q_ground_truth = reward, but for truncated: q_ground_truth = reward + q_next_step_status (gamma ignored).
is my understand about terminated/truncated right? How terminated/truncated is distinguished by server?

@tiankaidong great question! If you look into the Unity3D example it becomes a bit clearer: we need to use the multiagent_done_dict inside the log_returns() function.

Thanks a lot for your reply. I looked the Unity3D example and source code of log_returns() but still confused.
In the Unity3D example, multiagent_done_dict is set as terminateds. But for some customized play, terminateds and truncateds may both need to be feedbacked. I didn’t find any docs to demonstrate how to appropriately define multiagent_done_dict.
Also, I searched multiagent_done_dict in the code but didn’t find its application in policy_server side for training. How are they received and set for reward calculation?
Could you please demonstrate more about the detail.