RayTrainReportCallback error using in Pytorch Lightning

serena.hyesun · September 26, 2023, 1:27am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hi all,

I’m trying ray tune for hyperparameter searching with Pytorch Lightning.

I have followed the tutorial of implementing ray tune but I’m facing this error that I have no clue about.

When I set pytorch lightning trainer like below (as suggested in tutorial)

    trainer = Trainer(
        num_sanity_val_steps=0,
        devices="auto",
        accelerator="auto",
        strategy=RayDDPStrategy(find_unused_parameters=True),
        callbacks=[RayTrainReportCallback()],
        plugins=[RayLightningEnvironment()],
        enable_progress_bar=False,
    )

I get the following error

  File "/work/chs/ss_lightning/src/utils/tuner.py", line 107, in train_tune
    ray_trainer = TorchTrainer(train_func(args), run_config=run_config)
  File "/work/chs/ss_lightning/src/utils/tuner.py", line 60, in train_func
    callbacks=[RayTrainReportCallback()],
  File "/work/downloads/anaconda3/envs/ss/lib/python3.9/site-packages/ray/train/lightning/_lightning_utils.py", line 220, in __init__
    self.tmpdir_prefix = os.path.join(tempfile.gettempdir(), self.trial_name)
  File "/work/downloads/anaconda3/envs/ss/lib/python3.9/posixpath.py", line 90, in join
    genericpath._check_arg_types('join', a, *p)
  File "/work/downloads/anaconda3/envs/ss/lib/python3.9/genericpath.py", line 152, in _check_arg_types
    raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'NoneType

seems like RayTrainReportCallback() to be the problem but not sure…

I’ve used ray version 2.7.0 and also the nightlies

what could be the problem?

matthewdeng · September 26, 2023, 8:40pm

Can you share the script you are running?

brkbarutcu · October 22, 2023, 7:45pm

Is there any update abut this issue? I’ve got the same error while completing the supplied tutorial.

Actually you don’t need the script. You can find it in the below documantation link.

https://docs.ray.io/en/latest/tune/examples/tune-pytorch-lightning.html

Ikkyu_Choi · October 25, 2023, 6:41am

I got a same error too.

justinvyu · October 26, 2023, 12:48am

@serena.hyesun The error you’re facing in this script is that train_func(args) is executing the script when it should just be passed as a function to the TorchTrainer.

-ray_trainer = TorchTrainer(train_func(args), run_config=run_config)
+ray_trainer = TorchTrainer(train_func, train_loop_config=args, run_config=run_config)

Instead, pass in the config parameter of the train_func with the train_loop_config argument.

justinvyu · October 26, 2023, 12:51am

@brkbarutcu I do run into an issue when running the example on lightning 2.1.0, but it’s a different one than the OP faced:

Expected one of: PrecisionPlugin, CheckpointIO, ClusterEnviroment, LayerSync, or Strategy.

Is this also what you get?

The issue for me was that ClusterEnviroment was imported from the old pytorch_lightning while other things were using the new lightning.pytorch package. Make sure that you only have one or the other installed – pip install lightning rather than pip install pytorch_lightning.

justinvyu · October 26, 2023, 12:52am

@Ikkyu_Choi Which of the errors did you encounter?

brkbarutcu · October 26, 2023, 7:17pm

You are right @justinvyu in my enironment both lightning.pytorch and pytorch_lightning packages are installed. However, when I am deleting pytorch-lightning package lots of ray train files give error that pytorch_lightning is missing. So I’ve changed my code to use only pytorch_lightning I am getting the same error as previously mentioned.

Here is my output by using only pytorch_lightning;

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
g:\My Drive\01_Technical\00_Soilmodel\20231014\ray_train.py in line 76
     61 config = {
     62     'Scalarlim': 100,
     63     'num_lstm_layers': 2,
   (...)
     66     'activation_fun': nn.ReLU()
     67 }
     68 # Define a TorchTrainer without hyper-parameters for Tuner
     69 # ray_trainer = TorchTrainer(
     70 #     train_func,
   (...)
     74 # )
     75 # result = ray_trainer.fit()
---> 76 train_func(config=config)

g:\My Drive\01_Technical\00_Soilmodel\20231014\ray_train.py in line 46, in train_func(config)
     30 Train_dataloader, num_input, num_output, Val_dataloader = env.gen_dataset(
     31     batch_size=batch_size, shuffle=False, split_data=True)
     33 litmodel = annlight.Litmodule(input_size=num_input,
     34                               out_size=num_output,
     35                               loss_fn=nn.HuberLoss(),
   (...)
     39                               activation_fun=config['activation_fun']
     40                               )
     42 trainer = pl.Trainer(
     43     devices="auto",
     44     accelerator="auto",
     45     strategy=RayDDPStrategy(),
---> 46     callbacks=[RayTrainReportCallback()],
     47     plugins=[RayLightningEnvironment()],
     48     enable_progress_bar=True,
     49 )
     51 trainer = prepare_trainer(trainer)
     52 trainer.fit(litmodel,
     53             train_dataloaders=Train_dataloader,
     54             val_dataloaders=Val_dataloader,
     55             )

File c:\Users\BurakNebilBarutcu\anaconda3\envs\tez\Lib\site-packages\ray\train\lightning\_lightning_utils.py:220, in RayTrainReportCallback.__init__(self)
    218 self.trial_name = train.get_context().get_trial_name()
    219 self.local_rank = train.get_context().get_local_rank()
--> 220 self.tmpdir_prefix = os.path.join(tempfile.gettempdir(), self.trial_name)
    221 if os.path.isdir(self.tmpdir_prefix) and self.local_rank == 0:
    222     shutil.rmtree(self.tmpdir_prefix)

File <frozen ntpath>:147, in join(path, *paths)

File <frozen genericpath>:152, in _check_arg_types(funcname, *args)

TypeError: join() argument must be str, bytes, or os.PathLike object, not 'NoneType'```

justinvyu · October 26, 2023, 7:47pm

@brkbarutcu Ah, my bad, I was running with master, which includes this fix which removes the dependency onpytorch_lightning. This will be packaged in Ray 2.8.

The issue you’re facing now is the same as in the original post – you shouldn’t run this function outside the scope of the TorchTrainer, since the provided lightning callbacks/plugins use train.get_context() methods which assume that you’re executing within a trainer.fit() / tuner.fit() call.

You should just use the commented code above train_func(config=config) instead.

Topic		Replies	Views
Tune - support for Lightning (not pytorch_lightning)? Ray Tune	1	371	March 5, 2024
Error Integration with Pytorch Lightning: TypeError: on_validation_batch_end() missing 1 required positional argument: 'dataloader_idx' Ray Tune	4	3539	April 19, 2021
Ray lightning train Ray Train	6	575	February 3, 2022
Ray train examples are broken Ray Train	1	598	May 10, 2022
TuneReportCallback is unable to read PyTorch Lightning metrics during Tuner.fit(...) Ray Tune	5	1304	January 27, 2023

RayTrainReportCallback error using in Pytorch Lightning

Related topics