Why done is almost always False?

Hi all,
I am training Rainbow agent for CartPole-v0 with ray.tune.
While the agent learns to keep the pole stable, my done curve in Tensorboard is always False. Please see this image. Does anyone know what is the problem?

I expect to have done=True when an episode terminates. But, seems this is not the case.

Thanks!

@deepgravity,

That done is telling you whether the tune trial (the rl training) is done not the state of episodes in the environment.

Hi @mannyv,
Thanks for your response.
So in my example, done is always 0. What does it mean?
And would you please clarify what you meant by the rl training?

When you call tune.run you are asking tune to run some trials of something. In this case you ares asking tune to use rllib to train an rl problem. It looks like you only have 1 trial.

You have only asked tune to train one thing. You could have done multiple by asking it to for example train with two different learning rates to compare performance or you could have set num_samples>1. Then it would run several different identical configurations so you could look at variability.

In those cases you have multiple trials some of which might be running at the same time and others which might be queued until others finish. For example you ask for 10 trials and you only have 8cpusand use 1 cpu per trial. You would have to wait for 2 trials to finish before 9 and 10 start.

The tune value you are looking at is telling you if the trial is still training or if it has finished.

Many thanks @mannyv for the detailed explanation.

“It looks like you only have 1 trial.”
Yes, I only have one trial now.

So, trial in Ray refers to the number of configs. If I use tune for hyperparameter tunning for learning rate, let’s say for only two values lr1, lr2, then I have 2 trials?

1 Like

Yes, that is correct!

1 Like