I am trying to figure out why I am not getting tensorboard event files in my own code. I am stuck debugging my own code, so I tried one of the examples.
I’m running the rock-paper-scissors multi-agent example from the documentation. I expect that there are tensorboard event files, but they don’t exists even though the usual progress.csv and result.json do:
➜ PG ls
PG_RockPaperScissors_767ac_00000_0_2021-10-12_19-59-25 experiment_state-2021-10-12_19-59-25.json
basic-variant-state-2021-10-12_19-59-25.json
➜ PG ls PG_RockPaperScissors_767ac_00000_0_2021-10-12_19-59-25
params.json params.pkl progress.csv result.json
In rock_paper_scissors_multiagent.py, lines 172, 175 and 178 do generate tensorboard event files, but line 169 does not. Above is the list of files in its output folder. Is that the correct behavior? It’s progress.csv has 151 rows and timesteps_total reaches 60,000.
Runtime environment: macOS. Python 3.8.10. Tensorflow 2.6.0. Ray 1.5.2. Freshly installed in pyenv virtualenv.
RLlib was installed by:
I didn’t save the running log. Here is a new run (still no tensorboard event file):
2021-10-12 22:02:53,828 INFO services.py:1245 -- View the Ray dashboard at http://127.0.0.1:8265
== Status ==
Memory usage on this node: 17.6/32.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/9.05 GiB heap, 0.0/4.52 GiB objects
Result logdir: /Users/rick.lan/ray_results/PG
Number of trials: 1/1 (1 PENDING)
(pid=39584) 2021-10-12 22:02:59,963 INFO trainer.py:706 -- Tip: set framework=tfe or the --eager flag to enable TensorFlow eager execution
(pid=39584) 2021-10-12 22:02:59,963 INFO trainer.py:718 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=39584) 2021-10-12 22:03:00,892 WARNING util.py:55 -- Install gputil for GPU system monitoring.
== Status ==
Memory usage on this node: 17.6/32.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/16 CPUs, 0/0 GPUs, 0.0/9.05 GiB heap, 0.0/4.52 GiB objects
Result logdir: /Users/rick.lan/ray_results/PG
Number of trials: 1/1 (1 RUNNING)
== Status ==
Memory usage on this node: 17.6/32.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/16 CPUs, 0/0 GPUs, 0.0/9.05 GiB heap, 0.0/4.52 GiB objects (0.0/1.0 CPU_group_cabc3d4b400ba4b0e56329cb32b99ad3, 0.0/1.0 CPU_group_0_cabc3d4b400ba4b0e56329cb32b99ad3)
Result logdir: /Users/rick.lan/ray_results/PG
Number of trials: 1/1 (1 RUNNING)
== Status ==
Memory usage on this node: 17.6/32.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/16 CPUs, 0/0 GPUs, 0.0/9.05 GiB heap, 0.0/4.52 GiB objects (0.0/1.0 CPU_group_cabc3d4b400ba4b0e56329cb32b99ad3, 0.0/1.0 CPU_group_0_cabc3d4b400ba4b0e56329cb32b99ad3)
Result logdir: /Users/rick.lan/ray_results/PG
Number of trials: 1/1 (1 RUNNING)
== Status ==
Memory usage on this node: 17.6/32.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/16 CPUs, 0/0 GPUs, 0.0/9.05 GiB heap, 0.0/4.52 GiB objects (0.0/1.0 CPU_group_cabc3d4b400ba4b0e56329cb32b99ad3, 0.0/1.0 CPU_group_0_cabc3d4b400ba4b0e56329cb32b99ad3)
Result logdir: /Users/rick.lan/ray_results/PG
Number of trials: 1/1 (1 RUNNING)
== Status ==
Memory usage on this node: 17.5/32.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/9.05 GiB heap, 0.0/4.52 GiB objects (0.0/1.0 CPU_group_cabc3d4b400ba4b0e56329cb32b99ad3, 0.0/1.0 CPU_group_0_cabc3d4b400ba4b0e56329cb32b99ad3)
Result logdir: /Users/rick.lan/ray_results/PG
Number of trials: 1/1 (1 TERMINATED)
2021-10-12 22:03:21,358 INFO tune.py:550 -- Total run time: 26.04 seconds (25.84 seconds for the tuning loop).
run_same_policy: ok.