RayTrainReportCallback error using in Pytorch Lightning

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hi all,

I’m trying ray tune for hyperparameter searching with Pytorch Lightning.

I have followed the tutorial of implementing ray tune but I’m facing this error that I have no clue about.

When I set pytorch lightning trainer like below (as suggested in tutorial)

    trainer = Trainer(
        num_sanity_val_steps=0,
        devices="auto",
        accelerator="auto",
        strategy=RayDDPStrategy(find_unused_parameters=True),
        callbacks=[RayTrainReportCallback()],
        plugins=[RayLightningEnvironment()],
        enable_progress_bar=False,
    )

I get the following error

  File "/work/chs/ss_lightning/src/utils/tuner.py", line 107, in train_tune
    ray_trainer = TorchTrainer(train_func(args), run_config=run_config)
  File "/work/chs/ss_lightning/src/utils/tuner.py", line 60, in train_func
    callbacks=[RayTrainReportCallback()],
  File "/work/downloads/anaconda3/envs/ss/lib/python3.9/site-packages/ray/train/lightning/_lightning_utils.py", line 220, in __init__
    self.tmpdir_prefix = os.path.join(tempfile.gettempdir(), self.trial_name)
  File "/work/downloads/anaconda3/envs/ss/lib/python3.9/posixpath.py", line 90, in join
    genericpath._check_arg_types('join', a, *p)
  File "/work/downloads/anaconda3/envs/ss/lib/python3.9/genericpath.py", line 152, in _check_arg_types
    raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'NoneType

seems like RayTrainReportCallback() to be the problem but not sure…

I’ve used ray version 2.7.0 and also the nightlies

what could be the problem?

1 Like

Can you share the script you are running?

Is there any update abut this issue? I’ve got the same error while completing the supplied tutorial.

Actually you don’t need the script. You can find it in the below documantation link.

https://docs.ray.io/en/latest/tune/examples/tune-pytorch-lightning.html

I got a same error too.

@serena.hyesun The error you’re facing in this script is that train_func(args) is executing the script when it should just be passed as a function to the TorchTrainer.

-ray_trainer = TorchTrainer(train_func(args), run_config=run_config)
+ray_trainer = TorchTrainer(train_func, train_loop_config=args, run_config=run_config)

Instead, pass in the config parameter of the train_func with the train_loop_config argument.

@brkbarutcu I do run into an issue when running the example on lightning 2.1.0, but it’s a different one than the OP faced:

Expected one of: PrecisionPlugin, CheckpointIO, ClusterEnviroment, LayerSync, or Strategy.

Is this also what you get?

The issue for me was that ClusterEnviroment was imported from the old pytorch_lightning while other things were using the new lightning.pytorch package. Make sure that you only have one or the other installed – pip install lightning rather than pip install pytorch_lightning.

@Ikkyu_Choi Which of the errors did you encounter?

You are right @justinvyu in my enironment both lightning.pytorch and pytorch_lightning packages are installed. However, when I am deleting pytorch-lightning package lots of ray train files give error that pytorch_lightning is missing. So I’ve changed my code to use only pytorch_lightning I am getting the same error as previously mentioned.

Here is my output by using only pytorch_lightning;

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
g:\My Drive\01_Technical\00_Soilmodel\20231014\ray_train.py in line 76
     61 config = {
     62     'Scalarlim': 100,
     63     'num_lstm_layers': 2,
   (...)
     66     'activation_fun': nn.ReLU()
     67 }
     68 # Define a TorchTrainer without hyper-parameters for Tuner
     69 # ray_trainer = TorchTrainer(
     70 #     train_func,
   (...)
     74 # )
     75 # result = ray_trainer.fit()
---> 76 train_func(config=config)

g:\My Drive\01_Technical\00_Soilmodel\20231014\ray_train.py in line 46, in train_func(config)
     30 Train_dataloader, num_input, num_output, Val_dataloader = env.gen_dataset(
     31     batch_size=batch_size, shuffle=False, split_data=True)
     33 litmodel = annlight.Litmodule(input_size=num_input,
     34                               out_size=num_output,
     35                               loss_fn=nn.HuberLoss(),
   (...)
     39                               activation_fun=config['activation_fun']
     40                               )
     42 trainer = pl.Trainer(
     43     devices="auto",
     44     accelerator="auto",
     45     strategy=RayDDPStrategy(),
---> 46     callbacks=[RayTrainReportCallback()],
     47     plugins=[RayLightningEnvironment()],
     48     enable_progress_bar=True,
     49 )
     51 trainer = prepare_trainer(trainer)
     52 trainer.fit(litmodel,
     53             train_dataloaders=Train_dataloader,
     54             val_dataloaders=Val_dataloader,
     55             )

File c:\Users\BurakNebilBarutcu\anaconda3\envs\tez\Lib\site-packages\ray\train\lightning\_lightning_utils.py:220, in RayTrainReportCallback.__init__(self)
    218 self.trial_name = train.get_context().get_trial_name()
    219 self.local_rank = train.get_context().get_local_rank()
--> 220 self.tmpdir_prefix = os.path.join(tempfile.gettempdir(), self.trial_name)
    221 if os.path.isdir(self.tmpdir_prefix) and self.local_rank == 0:
    222     shutil.rmtree(self.tmpdir_prefix)

File <frozen ntpath>:147, in join(path, *paths)

File <frozen genericpath>:152, in _check_arg_types(funcname, *args)

TypeError: join() argument must be str, bytes, or os.PathLike object, not 'NoneType'```

@brkbarutcu Ah, my bad, I was running with master, which includes this fix which removes the dependency onpytorch_lightning. This will be packaged in Ray 2.8.

The issue you’re facing now is the same as in the original post – you shouldn’t run this function outside the scope of the TorchTrainer, since the provided lightning callbacks/plugins use train.get_context() methods which assume that you’re executing within a trainer.fit() / tuner.fit() call.

You should just use the commented code above train_func(config=config) instead.