Memory exhausting problem when using Dataset (from ray.data) with RLLib

mauro-belgiovine · September 26, 2022, 12:50pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

I have a moderate size dataset saved in a bunch of pickle files that I load into a ray.data.dataset.Dataset using the function .from_items() . I am using a static dataset since the generation of the environment state on the fly requires a long time. Every agent have access to the same Dataset instance and randomly samples a portion of the main dataset using .random_samples(fraction), in order to distribute the dataset evenly among the agent workers.

Whenever I run out of samples in the random ‘partition’, I pick another .random_samples(fraction) and work on a new set of samples. This is to avoid that every agent while training in multiple workers carries a copy of the whole dataset. The problem is that at every call of .random_samples(fraction), the memory occupied by the process increase considerably. You can see these two plots that show the increase of memory percent utilization (on the right) that happens with the spikes in the CPU utilization (on the left) when the new random samples are drawn from the dataset during training.

This behaviour eventually fills my memory, making it impossible to finish the experiment (consider that the machine I am using has 500GB of RAM). Any idea of how to deal with this?

NOTE that I am using Ray 1.13.0, should I consider upgrading my library version?

mauro-belgiovine · October 4, 2022, 5:28pm

UPDATE: I have tried with Ray 2.0.0 and I have the same issue.

arturn · October 12, 2022, 8:56am

Can you please log a github issue with a reproduction script? That would be great

Thanks for reporting this!

Topic		Replies	Views
Optimal way to load in a common dataset to an RL env when using many workers RLlib	5	324	July 5, 2022
RayOutOfMemoryError RLlib	2	787	May 24, 2021
Utilizing Ray Dataset for faster loading/saving RLlib	1	237	July 27, 2022
Learning from large static datasets RLlib	2	475	April 10, 2022
Shared dataset on a local desktop	1	294	March 7, 2023

Memory exhausting problem when using Dataset (from ray.data) with RLLib

Related topics