Hi there,
I’ve been playing around with the evaluation routine with the trainers. I’m having difficulty getting the rollout workers to do a consistent number of evaluations per time.
For example, if I set the evaluation_num_episodes to 300 with 3 workers, so the evaluate routine calls worker.sample.remote() 300 times.
With the default config I have a very variable number of episodes per evaluation: (len is mean episode_len)
episodes: 445, len: 11.3
episodes: 582, len: 7.5
episodes: 442, len: 11.2
episodes: 450, len: 10.5
episodes: 429, len: 11.1
episodes: 442, len: 10.8
episodes: 551, len: 8.3
episodes: 409, len: 11.5
episodes: 479, len: 10.0
episodes: 404, len: 12.3
episodes: 450, len: 10.8
episodes: 383, len: 13.0
episodes: 401, len: 12.4
episodes: 374, len: 13.8
episodes: 411, len: 11.7
episodes: 370, len: 13.5
episodes: 381, len: 13.4
episodes: 350, len: 14.7
episodes: 324, len: 16.3
episodes: 339, len: 15.8
With rollout_fragment_length set to 1 I get better results:
episodes: 310, len: 11.2
episodes: 294, len: 7.3
episodes: 302, len: 11.7
episodes: 300, len: 11.0
episodes: 300, len: 11.2
episodes: 295, len: 10.9
episodes: 305, len: 8.5
episodes: 299, len: 12.3
episodes: 301, len: 10.5
episodes: 302, len: 12.6
episodes: 295, len: 11.1
episodes: 302, len: 13.9
episodes: 300, len: 12.4
episodes: 298, len: 12.4
episodes: 303, len: 12.2
episodes: 299, len: 13.8
episodes: 300, len: 14.1
episodes: 300, len: 14.4
episodes: 296, len: 16.3
episodes: 304, len: 15.4
However it’s still pretty inconsistent which is frustrating and sometimes it’s now less than the specified number of episodes which I think is worse. the batch mode didn’t seem to have any effect on the numbers.
Is there any better way to sample evaluations, or have I just missed a configuration option?
Cheers,
Rory