[RLlib] Timesteps total gets reset everytime 'num_healthy_workers' goes down

Hi!

Ray: v1.0.1
Tensorflow: 2.0
Python: 3.6.9
Ubuntu: 16.04
Head node VM: 64 cores, 504 Gb memory

In my tensorboard plots, every time a worker is blacklisted, the timesteps_total value gets reset and I get these zig zag plots against number of training steps. Here is the step-based TB plot for num_healthy_workers:
image

Plotting training against wall clock time, however, gives me linear plots. Do you happen to know what might cause timesteps_total to reset back every time num_healthy_workers decreases?

Below I attached a list of how the timesteps_total gets reset between training iterations:
Training iteration: 1, Timesteps total: 12000
Training iteration: 2, Timesteps total: 24000
Training iteration: 3, Timesteps total: 36000
Training iteration: 4, Timesteps total: 48000
Training iteration: 5, Timesteps total: 60000
Training iteration: 6, Timesteps total: 72000
Training iteration: 7, Timesteps total: 84000
Training iteration: 8, Timesteps total: 96000
Training iteration: 9, Timesteps total: 108000
Training iteration: 10, Timesteps total: 120000
Training iteration: 11, Timesteps total: 11600

Training iteration: 12, Timesteps total: 23200
Training iteration: 13, Timesteps total: 34800
Training iteration: 14, Timesteps total: 46400
Training iteration: 15, Timesteps total: 11400
Training iteration: 16, Timesteps total: 22800
Training iteration: 17, Timesteps total: 34200
Training iteration: 18, Timesteps total: 45600
Training iteration: 19, Timesteps total: 57000
Training iteration: 20, Timesteps total: 68400
Training iteration: 21, Timesteps total: 11200

Training iteration: 22, Timesteps total: 22400
Training iteration: 23, Timesteps total: 33600
Training iteration: 24, Timesteps total: 44800
Training iteration: 25, Timesteps total: 56000
Training iteration: 26, Timesteps total: 67200
Training iteration: 27, Timesteps total: 78400

Many thanks!

Hey Raluca, thanks for posting this. I’m thinking it’s simply tune that resets the timesteps to 0 once it has to restart failed workers. Could you ask this as a Ray tune question?
My thinking is that tune should use the last timestep reached whenever it recreates a failed worker from some checkpoint (and I do remember that it does that correctly, but this could be a different issue here).