Anyone has experience successfully setting ulimits for open file descriptors for ray start when running on EC2? In my cluster yaml, head_start_ray_commands
looks like this:
head_start_ray_commands:
- ray stop
- ulimit -n 65536; ray start --head --port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml
But I’m not sure the 65536 limit is actually honored because the default limit in EC2 instances is 8192, and trying to run ulimit -n 65536
gives this error: bash: ulimit: open files: cannot modify limit: Operation not permitted
. To verify that ray workers indeed have a ulimit of 8192 (instead of 65536), I ran this snippet:
import resource
import ray
def get_limit():
return resource.getrlimit(resource.RLIMIT_NOFILE)
ray.init(address='auto')
f = ray.remote(get_limit)
result = ray.get(f.remote())
print(result) # Soft, hard limit
# Result was (8192, 8192) on a r5.2xlarge instance
To increase the ulimit, I tried running sudo bash -c "echo $USER hard nofile 65536 >> /etc/security/limits.conf"
as recommended in the docs. This increases the limit, but only after a sudo reboot
of the ec2 instance (log-out log-in does not update the limit). I’m afraid adding the limit update to my setup scripts is not useful since the instance wont be restarted before the head_start_ray_commands are run.
Is there a way to reliably set the ulimit on EC2 instances in the autoscaler yaml?