Anyone has experience successfully setting ulimits for open file descriptors for ray start when running on EC2? In my cluster yaml,
head_start_ray_commands looks like this:
head_start_ray_commands: - ray stop - ulimit -n 65536; ray start --head --port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml
But I’m not sure the 65536 limit is actually honored because the default limit in EC2 instances is 8192, and trying to run
ulimit -n 65536 gives this error:
bash: ulimit: open files: cannot modify limit: Operation not permitted . To verify that ray workers indeed have a ulimit of 8192 (instead of 65536), I ran this snippet:
import resource import ray def get_limit(): return resource.getrlimit(resource.RLIMIT_NOFILE) ray.init(address='auto') f = ray.remote(get_limit) result = ray.get(f.remote()) print(result) # Soft, hard limit # Result was (8192, 8192) on a r5.2xlarge instance
To increase the ulimit, I tried running
sudo bash -c "echo $USER hard nofile 65536 >> /etc/security/limits.conf" as recommended in the docs. This increases the limit, but only after a
sudo reboot of the ec2 instance (log-out log-in does not update the limit). I’m afraid adding the limit update to my setup scripts is not useful since the instance wont be restarted before the head_start_ray_commands are run.
Is there a way to reliably set the ulimit on EC2 instances in the autoscaler yaml?