Tensorboard stops working for no apparent reason. Could you help narrow down the issue?

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hello,

I’ve been using RLlib (through MARLlib package to give more context if helpful) and been able to save tensorboard logs and visualize them using tensorboard. My pipeline for running for experiments and visualizing using tensor board has been well used for months. All logs have always been saved in the same folder named ‘exp_results’, the command used in command line to run tensor board is always the same

tensorboard --logdir=./exp_results

Virtual env for the project is the same always and I didn’t consciously make any changes either in adding new packages or modifying existing ones.

Now, yesterday I try to run tensor board as usual and suddenly it just doesn’t work. I haven’t changed anything in the virtual env, I am trying to access the very same log files I’ve been visualizing in tensor board for weeks so it’s not like the log file has an issue. The directory is correct because exact same command worked before and also running

tensorboard --inspect --logdir=./exp_results

shows that tensorboard indeed is able to find logs in that location.

Terminal doesn’t show any error, it is exact same as it is when things were running well, however, I found one error when I opened Javascript console of Safari browser on my MacBook - the error read as

Failed to load resource: the network connection was lost

I tried different ports by using ‘–port=xxxx like 6007’, either the result would remain as before (completely white blank browser page that either loads forever or doesn’t load) or the GUI of tensorboard actually loads up but with no plots (as shown in the other image). I tried every other stack overflow and GitHub issue suggestion I found and at best I would get the GUI load up but without plots.

With those issues ruled out, do you have any suggestions on what could be the issue here?

Following are things I’ve tried but didn’t solve the issue.

I tried changing ports using ‘–port=xxxx’ option in command line. I also tried closing open ports by manually going to ‘Ports’ tab (in panel region) in VS Code. Tried killing existing process of tensorboard (as in python 3.x - Tensorboard Error: No dashboards are active for current data set - Stack Overflow. note the stack overflow answer is in context of Jupyter notebook and I am not using Jupyter notebook but still felt the suggestion of killing process could be useful). Tried adding ‘–host localhost’ option in command line.

Tensorboard version is 2.14.0.
Ray version is 1.8.0.
The machine is being remotely accessed using SSH via VS Code on a MacBook.