It looks like the memory issue is probably due to having too many Agent
actors running in parallel. We’re actively working on this type of problem for v2.1 and 2.2, but for now I think the best thing to try would be to run fewer agents in parallel. There are two ways you can do this:
- Pass fewer
num_cpus
toray.init
, likeray.init(num_cpus=8)
, even though you have 16 vCPUs available. - (suggested) Modify your actor definitions to request more CPUs. You can do this by modifying this line.