Tune.run ignores loggers=None and fails on TBXLogger

Hi,

tune.run completely ignores that I disable loggers argument (setting loggers=None), here is my tune.run function details:

    analysis = tune.run(
        run_search_distributed_tune,
        # loggers=[CSVLogger, JsonLogger],
        loggers=None,
        name="loanrecommender_hyperparam_search_with_ray_tune",
        # scheduler=sched,
        metric="mean_accuracy",
        mode="max",
        stop={
            "mean_accuracy": 0.99,
            "training_iteration": hyper_file['iterations']
        },
        num_samples=hyper_file['iterations'],
        resources_per_trial={
            "memory": 28500 * 1024 * 1024,
            "gpu": 0.25
        },
        config=search_space,
        local_dir=getRootDir(),
        # keep_checkpoints_num=2,   # Keep only the best checkpoint
        checkpoint_score_attr='mean_accuracy',  # Metric used to compare checkpoints
        verbose=1
    )

and exception is:

Connecting to ray cluster
2021-02-18 15:46:34,212 INFO worker.py:656 – Connecting to existing Ray cluster at address: 10.150.147.159:6379
Traceback (most recent call last):
File “C:\Users\dm57337.conda\envs\py38tf\lib\site-packages\tensorboardX\record_writer.py”, line 58, in open_file
factory = REGISTERED_FACTORIES[prefix]
KeyError: ‘C’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “C:/dev/clientactivity/find_best_model.py”, line 186, in
analysis = tune.run(
File “C:\Users\dm57337.conda\envs\py38tf\lib\site-packages\ray\tune\tune.py”, line 419, in run
runner.step()
File “C:\Users\dm57337.conda\envs\py38tf\lib\site-packages\ray\tune\trial_runner.py”, line 355, in step
self._callbacks.on_trial_start(
File “C:\Users\dm57337.conda\envs\py38tf\lib\site-packages\ray\tune\callback.py”, line 180, in on_trial_start
callback.on_trial_start(**info)
File “C:\Users\dm57337.conda\envs\py38tf\lib\site-packages\ray\tune\logger.py”, line 428, in on_trial_start
self.log_trial_start(trial)
File “C:\Users\dm57337.conda\envs\py38tf\lib\site-packages\ray\tune\logger.py”, line 636, in log_trial_start
self._trial_writer[trial] = self._summary_writer_cls(
File “C:\Users\dm57337.conda\envs\py38tf\lib\site-packages\tensorboardX\writer.py”, line 275, in init
self._get_file_writer()
File “C:\Users\dm57337.conda\envs\py38tf\lib\site-packages\tensorboardX\writer.py”, line 323, in _get_file_writer
self.file_writer = FileWriter(logdir=self.logdir,
File “C:\Users\dm57337.conda\envs\py38tf\lib\site-packages\tensorboardX\writer.py”, line 94, in init
self.event_writer = EventFileWriter(
File “C:\Users\dm57337.conda\envs\py38tf\lib\site-packages\tensorboardX\event_file_writer.py”, line 106, in init
self._ev_writer = EventsWriter(os.path.join(
File “C:\Users\dm57337.conda\envs\py38tf\lib\site-packages\tensorboardX\event_file_writer.py”, line 43, in init
self._py_recordio_writer = RecordWriter(self._file_name)
File “C:\Users\dm57337.conda\envs\py38tf\lib\site-packages\tensorboardX\record_writer.py”, line 176, in init
self._writer = open_file(path)
File “C:\Users\dm57337.conda\envs\py38tf\lib\site-packages\tensorboardX\record_writer.py”, line 61, in open_file
return open(path, ‘wb’)
FileNotFoundError: [Errno 2] No such file or directory: ‘C:\dev\clientactivity\loanrecommender_hyperparam_search_with_ray_tune\run_search_distributed_tune_b9fe7_00000_0_activation=tanh,batch_size=512,dot_product=False,dropout=0.55332,include_branch_concat_i_2021-02-18_15-46-41\events.out.tfevents.1613656001.TLVCMEW001410’

Actually, there are 2 problems over here:

  1. tune.run ignores setting ‘loggers=None’
  2. TBXLogger fails due to parsing my local path, when it supports only S3 & Google Cloud Storage:
    https://github.com/lanpa/tensorboardX/blob/34d1616c035faaa0f3f7c9d19cb8bb4425f19939/tensorboardX/record_writer.py#L99
    https://github.com/lanpa/tensorboardX/blob/34d1616c035faaa0f3f7c9d19cb8bb4425f19939/tensorboardX/record_writer.py#L114

Hey @diman82, Tune automatically adds default loggers as it relies on CSV and JSON logging for the experiment analysis object.

You can prevent Tune from adding these loggers with an environment variable: just set TUNE_DISABLE_AUTO_CALLBACK_LOGGERS=1

https://docs.ray.io/en/master/tune/user-guide.html#environment-variables

(Note that you should probably pass the CSVLogger and JsonLogger in that case)

Does this help?

Yes, passed that error.
But why do I get it in the first place? Why TBXLogger is being called by default, if it supports only cloud based factories?
Also, setting ‘loggers=[CSVLogger, JsonLogger]’ doesn’t change a bit - TBXLogger is still being used.

Yes, CSV, JSON and TBX loggers are always added as many users rely on tensorboard to track their training process. However, setting TUNE_DISABLE_AUTO_CALLBACK_LOGGERS=1 disables all loggers, so in that case the CSV and JSON loggers should be passed.

I haven’t seen the error you posted before, so I assume it works for most users. Actually the error you’re seeing might be related to windows, but I don’t have access to a windows machine to test this.

Would you advise me to open an issue on github page?

Yes, please open one for the tensorboard logging issue! Thanks!

1 Like

I fix that with

The problem is due to path length limitations on windows. You can enable long path by following instructions on this page: https://www.howtogeek.com/266621/how-to-make-windows-10-accept-file-paths-over-260-characters/

2 Likes

bro, thanks! :grinning: that’s the real issue for windows