How can I get the `gpu_id` assigned to the trial using the `trial_id`?

Hello, Ray!

I’m performing HPO using Tune’s Concurrent Trial. RayTune automatically detects GPUs and assigns GPUs to Trial. How can I get the gpu_id assigned to the Trial using trial_id ?

Hey @marload, glad to see you here! Big fan of your work on Hyperopt :slight_smile:

Can you describe what you’re trying to achieve?

On each training function, you can retrieve the gpu_id by calling ray.get_gpu_ids.

Hi, Richard! @rliaw

Our company plans to actively utilize Ray Echosystem.
I am developing my company’s research on-premise GPU Cluster based on Ray. One of the goals I’m trying to achieve is that when Tune is running, if a specific request occurs with the GPU ID from outside, it immediately stops the trial assigned to that GPU ID, and does not run the trial on that GPU until the request comes again. Which approach do you think would be good?

FYI, I am also a really huge fan of Ray and Anyacale. :grinning_face_with_smiling_eyes:

Thank you for your kind words!

Hmm, this sounds a bit tricky. Can I ask you a bit more about the business use case for why you want to do this?

As for implementation, I think one possibility is to do this entirely within the trainable. That is, every time you receive a request with a GPU ID on a node, you have a server that marks this notification in a file.

In the trainable function/class that you are using (tune.run(trainable)), you should run a separate thread that checks that “notification file”, and pauses execution/releases GPU memory until the notification file is marked.

Does this make sense? It is a bit hacky, but I think it should meet your requirements. You can use something like https://github.com/Stonesjtu/pytorch_memlab#courtesy to release GPU memory.

@rliaw
As you said, I solved this problem by creating Tiny Server.

I’m working on slowly changing the company’s existing ML research process to Ray Echoystem, and this issue is the first task. :grinning_face_with_smiling_eyes:

It’s not an issue question, but where should I ask questions about Anyscale solutions?

Awesome :slight_smile: I’ll reach out on slack DM to chat about Anyscale!