I would like to use my own executing platform. I want to write the code that starts a pod/thread/process and executes the training function.
I tried to implement TrialExecutor, but I get this message:
"A Tune session already exists in the current process. If you are using ray.init(local_mode=True), you must set ray.init(…, num_cpus=1, num_gpus=1) to limit available concurrency."
One trial gets stack in Running status and the rest stay in Pending.
Is implementing the TrialExecuter the right way to go about it?
Hi @Golan_Hallel, generally it is. However, over time the tune execution code (trial runner) became quite a bit more coupled with the RayTrialExecutor. We’re looking into refactoring this soon.
We do not currently test custom executors as this is an edge use case. However, the general interface (start/stop trials etc) should still be valid and you should be able to proceed.
It’s hard to help you with the problem you observed without having access to your code - if you’re happy to share parts of your implementation we might be able to help. What you see seems to indicate that you’re running multiple concurrent tune sessions, for instance because of multi-threading? It’s hard to tell just from the output though