Logging discrete action distribution during training and logging text

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.

Hello,

I have thought about how to log what discrete action my agents are taking during training. I would like to log this in some way (preferably using Tensorboard) to just get a quick glimpse of the behavior. Neither visual nor realtime inspection is really possible in the environment I’m working in. I have tried using histogram for this when trying with continuous actions but now it is tricky as it is a bit difficult to see what the histogram means when you have possibly many different actions.

Tensorboard allows for logging of text in what seems to be an easy way: https://www.tensorflow.org/tensorboard/text_summaries. However the loggers do not allow for text https://github.com/ray-project/ray/blob/master/python/ray/tune/logger/tensorboardx.py#L207. Is there any particular reason for this, perhaps due to it being difficult to merge the logging of multiple workers? I think I would be able to hack something to display the action distribution in text format and still see the development of action selection during training if text logging was possible.

Further on I would also like to log the selection of actions during an episode and see what actions are being taken in the beginning and in the end. Perhaps some further hacking of the text logging is possible. Of course it should be possible to log this separately and process the log with a separate script but I’m raising the question to see if anyone has encountered this before.

Thank you for any help,

Hi @zalador ,

Metrics that RLlib itself provides are aggregated and we simply calculate a mean/min/max for most things. If you want to access actions and plot a distribution directly - there is no easy setting to “just switch this on”.

Nonetheless, you can modify the training_step() method to do some hacking.
All batches that are collected go through that method and you can grab the actions from there.
Furthermore, the training_steop() also returns a result dict.

If I were you, I’d take control of logging actions in text format right there.
You can also pass a custom logger to tune (that wraps the tensorboard logger?) if that’s something that you would like to put effort into.

Thank you for your answer. I think saving and extracting actions with callbacks such as on_episode_step() and on_episode_end() should also be doable right? Especially as I can use the info dict for pretty much whatever? Then it should be simple to save them to some persistent file on disk during training. Perhaps someone else that has encountered this before will see and give input but for now that will be my approach.

Again thank you for the help.