My Ray programs stops learning when using distributed compute

Hi Mannyv,

Lars already thanked you for your extremely detailed answer here: Read Tune console output from Simple Q - #5 by mannyv But I would like to thank you again. It is extremely clearifying. Funny how I struggly with almost the same concepts as @Lars_Simon_Zehnder :).

Regarding the complete_episodes.
There is no specific reason I need this. I was afraid the agent would only learn the first rollout_fragment_length steps perfectly and can’t see further in the episodes.
Such an implementation off course would make little sense.

Is this how it works: After a worker has sampled the rollout_fragment_length steps, and start a new fragment, will it continue the episode where it stopped the previous fragment? So in this example at the second iteration, will it start/continue at step 5? Then I can just as well use truncate_episodes.

So I tried using config_simple[“batch_mode”] =“truncate_episodes”. I can confirm this even improves the learning for my case (quite a lot)! I don’t exactly understand why it makes such a difference though. Do you have any thoughts on it?