Debug MNIST tuto

Hi everyone,

I want to start using Ray Tune to implement hyperparameters optimization for my DL models.
I tried to just copy and paste the MNIST tuto to start with.

It works fine on my Macbook Air M1 (although a few things were missing in the code like arguments to the Accuracy metric).

However, impossible to make it work on my PC. Two main erros:

  • RuntimeError("Distributed package doesn’t have NCCL " “built in”)

  • Caught sync error: Sync process failed: GetFileInfo() yielded path ‘C:/Users/APU/ray_results/tune_mnist_asha/LightningTrainer_0a336_00002_2_layer_1_size=64,layer_2_size=128,lr=0.0003_2023-08-16_17-03-21/error.pkl’, which is outside base dir 'C:\Users\APU\ray_results\tune_mnist_asha\LightningTrainer_0a336_00002_2_layer_1_size=64,layer_2_size=128,lr=0.0003_2023-08-16_17-03-21'. Retrying after sleeping for 1.0 seconds…

Tried to set the backend to “goo” in my env variables as a fix for the first problem. It didn’t solve anything.
As for the second one, I don’t see where in the code it gets that link, so it is difficult to know what to change.

I find Ray to be quite daunting. Lots of outputs. So to start with a tuto that barely works is difficult.

Can any of you help me with that please?

Many thanks in advance

Cheers

Antoine