How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Setting: I have created a hierarchical multi-agent environment based on the python MultiAgentEnv class. The environment uses GPU resources for calling complex calculations embedded in a c++ library.
What is working: When i run the env on its own with simulated action dicts it runs fine.
Problem: When i want to train with tune the c++ library doesn’t detect my CUDA device anymore.
Question: How does ray RLLib/Tune affect the GPU availability when a cluster is started? Would it solve the problem to use an ExternalEnv?
Having a similar issue; I opened a question too. It seems that RLLIB sets CUDA_VISIBLE_DEVICES = 0 when the env is initialized. This makes sense, as the requested resources is used.
However, (it sounds like this is your case, too), there seems to be a problems when specifying fractional resources, which should be the natural solution. Curious to hear if you’ve resolved this!
@Aidan_McLaughlin I’m not completely sure what solved the issue in the end but with the newer ray versions i din’t have any issue regarding this. We had some raytracer built with NVIDIAoptix in our env. However we moved away to a different solution but our env still uses GPU. We are currently on 3.0.0.dev0 and everything works fine. What version are you using?