Use Policy_Trainer with TensorBoard

This is simply a mechanical thing every couple of seconds tune prints. If it takes longer, you get more prints (all identical in the most cases), if it is faster you get less. See this answer to my question for more infos.