How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
I am following cartpole server-client example to build my task. in this client side, related key commands are as follows:
client = PolicyClient()
eid = client.start_episode()
for loop:
action = client.get_action(eid, obs)
obs, reward, terminated, truncated, info = env.step(action)
client.log_returns(eid, reward, info=info)
if terminated or truncated:
client.end_episode(eid, obs)
It seems terminated/truncated values returned by env.step never passed to client api, how server side know if the episode end by truncated or terminated?
I think the principle for terminated/truncated is quite different.
for q learning type of algorithm, for terminated: q_ground_truth = reward, but for truncated: q_ground_truth = reward + q_next_step_status (gamma ignored).
is my understand about terminated/truncated right? How terminated/truncated is distinguished by server?