Currently, I have a deep reinforcement learning framework setup where a pytorch model is trained with DistributedDataParallel and the data comes from interacting with a simulator. These simulators are run inside child processes spawned by the distributed ranks.
I would like to apply tune to tune the hyperparameter, but I’m having difficulty using
tune.report. Specifically, I want the child process to be able to connect to the training instance, created in a parent process.
Is there a way to report metrics from a child process to an ancester trainer?