hey,
After training is completed there comes an issue with the checkpoint manager
we are using ray version 2.0
kindly give suggestions to resolve this
Hi @siva14guru do you have a minimal script to reproduce what you’re seeing, specially what you had for SyncConfig
?
i didn’t specify SyncConfig. we are trying to migrate from 1.12.1 to 2.0.0
where and how to specify sync config?
we are using TorchTrainer
trainer = TorchTrainer(
train_func,
train_loop_config={“lr”: 1e-3, “batch_size”: 64, “epochs”: 4},
scaling_config=ScalingConfig(num_workers=num_workers, use_gpu=use_gpu),
)
like this