I am trying to train a basic pytorch CNN model to ‘denoise’ the MNIST dataset. However, I am noticing that the majority of the time my CPU is barely being used but the RAM keeps playing around. I feel like the issue is it keeps ending processes to start a new one, but the time to create a new process is god-awfuly slow.
Is there anything I can do to speed up that creation time?
It sounds similar, how would I go about getting the nightly release properly? I had installed ray[tuner] with ‘pip install’ … I’d appreciate if you pointed me in the right direction to try out the fix!
So it didn’t work, however, I reach out to the Ray Slack Community, and they got it fixed for me -> the command that worked for me was pip install --user -U ` `https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-1.1.0.dev0-cp38-cp38-win_amd64.whl
So I had some time to actually try running ray tuner with the nightly version and it’s even worse now. It still takes a loooong time to start. And roughly by the point when it would begin its initial test the program just closes with code 3. (I’m running this through pycharm, python version 3.85).
However, just simply running the program from cmd seemed to work - kinda. The main issue still persists, it would spend most of its time seemingly preparing workers to run rather than actually running them…
I am still very much a beginner with the whole thing. Some things that are logged that may help you guys help me understand why it’s taking so long:
If i’m somehow mismanaging the memory or something, please let me know to speed it up! (and a possible fix for pycharm not being able to run the code but cmd being fine…)
I see. It’ll be nice if you can create an issue in Ray’s Github page with more details. For example, what’s the setup? What’s your workload? How slow is your process startup? and etc.
If those could be merged somehow, that’d be perfect. As they are related but not strictly about the same thing I don’t know if that’s a valid move. However, I can no longer respond for 15 hours (in that specific thread) due to being a new member so we had further discussion on hold.
Which was actually bugged and passing my whole dataset with the function through redis causing a massive slowdown. Instead, we had to refactor that out and use ray.get and ray.put to pass my dataset. So if anyone has this issue atm:
TL:DR
tune.with_parameters is bugged, do not use it. Use ray.put and ray.get for dataset passing!