Ray Tune changes the behaviour of my train function

Paul_V · January 15, 2021, 2:03pm

Hi,

when I run my training function everything works perfectly and I get the desired behaviours. I decided to pass my training function through ray tune and I get the following error:

(pid=55480)   File "/Users/paulvalsecchi/PycharmProjects/pythonProject/NCDE GAN code/Solver.py", line 160, in I
(pid=55480)     du_x = x.grad[:, ::step, :]
(pid=55480) TypeError: 'NoneType' object is not subscriptable

This is strange as if I just run my train function I get that x.grad is a tensor populated in such a way that I can subscribe it the way I have done.

I am using a variety of packages including signatory which I suspect might interfere with ray tune, but I don’t understand why I get the desired result when I run train(config) instead of

analysis = tune.run(
    train,
    num_samples=200,
    scheduler=ASHAScheduler(metric="Loss", mode="min", grace_period=10, max_t=200, reduction_factor=4),
    config=config,
    verbose=2)

which gives me the error I get above.

Any help would be greatly appreciated.

LucaCappelletti94 · January 15, 2021, 2:13pm

I faced a similar issue in the past, for me it was in the order of the arguments in the loss method. Could you share the simplified body of your loss function?

Paul_V · January 15, 2021, 2:18pm

Indeed the error occurs in the loss function. The loss function is fairly long, but the point that I believe to be causing the error is

def I(y_output_u, y_output_v, xv, yv, tv, x, y, t):
    y_output_u.retain_grad()
    y_output_u.backward(torch.ones_like(y_output_u), retain_graph=True)
    du_x = x.grad[:, ::step, :]

y_output_u is the output of the net which takes in a transformed version of x. The transformation of x occurs outside of the train function but the gradients should still work, no?

LucaCappelletti94 · January 15, 2021, 2:29pm

I believe so. I would suggest to investigate that all the parameters you receive in the loss function are exactly what you expect, in my case I had the hyper-parameters and the configuration mixed up in the order of the kwargs.

Paul_V · January 15, 2021, 2:35pm

I am fairly certain that that is not the case. I have gone through my code once more to check, but as I mentioned above, I get the correct behaviour when I run train(config).

kai · January 15, 2021, 6:38pm

It’s hard to debug this without proper context. Can you share the part of your code where you use the config argument?

Paul_V · January 16, 2021, 11:37am

@kai I use the config argument as follows:

    n1 = config['n1']
    n2 = config['n2']

    u_net = NeuralRDE(3, logsig_dim, config['u_hidden_dim'], 1, hidden_hidden_dim=config['u_hidden_hidden_dim'], num_layers=config['u_layers'], return_sequences=True).to(device)
    v_net = discriminator(config).to(device)

    optimizer_u = torch.optim.Adam(u_net.parameters(), lr=config['u_rate'])
    optimizer_v = torch.optim.Adam(v_net.parameters(), lr=config['v_rate'])

I have partially resolved the issue by placing some of the functions that train calls within it. This resolves the issue but I do not understand why this would impact the function?

kai · January 18, 2021, 10:47am

Yes, that’s odd. It shouldn’t interfere with the function. The one thing you might want to check is if the print output of the config argument somehow differs when Tune runs it. Slight differences might be numpy arrays instead of lists, shapes, or iterable dicts.

I’m still lacking context to evaluate this. Can you share your tune search space (the config you pass to tune.run()) and a manual config you pass to your training function that works?

Or, if possible, your complete training code? It can also be a stripped down version.

Topic		Replies	Views
Error When Trying to Tune a Trainable Function	8	2574	August 29, 2023
TypeErrors when passing search spaces to trainable function Ray Tune	2	525	January 6, 2022
Config tune variables aren't matching the training types Ray Tune	12	1136	December 5, 2022
RuntimeError: mat1 and mat2 shapes cannot be multiplied (4x16 and 10x128)	8	1040	July 12, 2023
AttributeError: 'NoneType' object has no attribute 'get' Ray Tune	3	1515	February 17, 2021

Ray Tune changes the behaviour of my train function

Related topics