[Core] How to reslove RayOutOfMemoryError in python for ray package?

I am getting below error when heavy operation runs in the application.

Error:

ray.memory_monitor.RayOutOfMemoryError

It’s taking 3.81/4.0 GB memory . Can anyone help me on this how to resolve this issue.

Is it required to increase the memory? if yes how to do that?

Below is initializer for ray in python application

import ray
ray.init(ignore_reinit_error=True)
ray.cluster_resources()["CPU"]

Can anyone guide me how to resolve RayOutOfMemoryError issue?

StackOverflow link

HI, @Dravid Thanks for asking the question! Firstly, I’d recommend you to take a look at the memory management section of the Ray document Memory Management — Ray v2.0.0.dev0.

The error happens when you use more than 95% of memory of your machine. It is usually not recommended to use that high memory because it can cause many unexpected bugs.

There could be usually 2 reasons of high memory usage.

  1. You use high object store memory of ray. You can check this from Ray dashboard (go to localhost:8265). In this case, there’s a possibility some of your objects are not GC’ed from the object store because you have references to objects. You can use ray memory command to debug this scenario.
  2. Another possibility is that you literally just use high memory in your Ray application. To figure this out, you can check out process’ memory usage from processes named as ray:: from htop command and observe the memory usage of each process.

@sangcho
Thank you for your quick reply.
We are using flask rest api and modin.pandas to serve learge amount of data. In this case Can we use ray.shutdown() function to reset the ray memory before sending the response?

Hmm that could work, but it is not a common pattern to call ray.shutdown for every request, and it is not recommended (the standard practice is to call ray.init(address=‘auto’) for the Flask process).

@sangcho could you please explain how ray.init(address=‘auto’) will help in this case ?

Usually, it is the most common pattern to run ray.init(address=‘auto’) when your process (flask server) starts. I am not saying this as a solution of your question, but I am saying the solution you suggested is not a common pattern. Running ray.init(address='auto') means you are creating a new job in the cluster, and ray.shutdown() means you are killing that job. So creating and killing job for every request could cause some unexpected problems (especially because it is a uncommon pattern).