Pytorch AMP Support

smorad · September 15, 2021, 2:06pm

It is very simple to implement:

Instead of
model(input_dict, ...)

it is

with torch.cuda.amp.autocast():
  model(input_dict, ...)

During the loss update, the gradscaler optimizer fixes any issues arising from over/underflow:

loss.backward()
optimizer.step()

becomes

scaler = torch.cuda.amp.GradScaler()
scaler.scale(loss).backward()
scaler.step(optimizer)

Would it be possible to have this worked into the Trainer config? I see it’s already implemented for RaySGD.

Topic		Replies	Views
[AMP]Mixed precision training is slower than default precision	8	1066	March 25, 2023
Data Type Issues when using Ray Tune	8	905	March 2, 2023
Can I just use amp in part of the training actors?	3	533	July 5, 2023
Very slow gradient descent on remote workers RLlib	14	2458	June 8, 2021
RLlib IMPALA multi GPU performance Configure Algorithm, Training, Evaluation, Scaling	3	608	March 19, 2023