How RLlib distinguish terminated/truncated situation in Server-client configuration?

tiankaidong · September 11, 2023, 8:09am

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I am following cartpole server-client example to build my task. in this client side, related key commands are as follows:
client = PolicyClient()
eid = client.start_episode()
for loop:
action = client.get_action(eid, obs)
obs, reward, terminated, truncated, info = env.step(action)
client.log_returns(eid, reward, info=info)
if terminated or truncated:
client.end_episode(eid, obs)
It seems terminated/truncated values returned by env.step never passed to client api, how server side know if the episode end by truncated or terminated?
I think the principle for terminated/truncated is quite different.
for q learning type of algorithm, for terminated: q_ground_truth = reward, but for truncated: q_ground_truth = reward + q_next_step_status (gamma ignored).
is my understand about terminated/truncated right? How terminated/truncated is distinguished by server?

Lars_Simon_Zehnder · September 13, 2023, 8:28am

@tiankaidong great question! If you look into the Unity3D example it becomes a bit clearer: we need to use the multiagent_done_dict inside the log_returns() function.

tiankaidong · September 14, 2023, 6:56am

Thanks a lot for your reply. I looked the Unity3D example and source code of log_returns() but still confused.
In the Unity3D example, multiagent_done_dict is set as terminateds. But for some customized play, terminateds and truncateds may both need to be feedbacked. I didn’t find any docs to demonstrate how to appropriately define multiagent_done_dict.
Also, I searched multiagent_done_dict in the code but didn’t find its application in policy_server side for training. How are they received and set for reward calculation?
Could you please demonstrate more about the detail.

Topic		Replies	Views
Multi-agent truncateds vs terminateds RLlib	5	1045	September 25, 2023
Setting terminated and truncated at episode end Configure Algorithm, Training, Evaluation, Scaling	1	832	August 24, 2023
'client.end_episode()' don't make any difference RLlib	3	656	July 26, 2022
How to Handle Agent Death In MultiAgent Scenarios RLlib	1	130	April 22, 2024
Delayed Learning Due To Long Episode Lengths RLlib	9	1291	September 10, 2021

How RLlib distinguish terminated/truncated situation in Server-client configuration?

Related topics