Specifying memory requirement for RLlib algorithms in Ray Tune etc

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I am running some experiments with DQN, which needs a lot of memory for its replay buffer, and I’d like to schedule multiple experiments on the same Ray cluster. By default, Ray only looks at the number of CPU cores requested when scheduling trials, which leads to it starting more trials than I have memory for. Is there a way to tell Ray how much memory my experiment will use?

I tried setting resources_per_trial={"cpu": 1, "memory": 12000000} in tune.run(), but I am getting the error
Resources for ... have been automatically set to ... by its 'default_resource_request()' method. Please clear the 'resources_per_trial' option.
I take this to mean that you’re not supposed to override the resource request from the RLlib algorithm, but then how do you manage memory for DQN algorithms?