The actor or task with ID ... cannot be scheduled right now

Hello, I recently updated Ray to the latest daily to check new deployment API and no longer can start any tasks using either deployment or old-style API.

Windows, empty new conda env:

pip install https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-2.0.0.dev0-cp38-cp38-win_amd64.whl
pip install “ray[serve]” # don’t think this matters, tested without too

Then running the code from quick-start, hello.deploy() get’s stuck

(pid=12012) 2021-05-08 15:52:36,800     INFO backend_state.py:747 -- Adding 1 replicas to backend 'Counter'.
2021-05-08 15:52:51,472 WARNING worker.py:1103 -- The actor or task with ID ffffffffffffffffaa26b4ec855fcde08ebed3c101000000 cannot be scheduled right now. It requires {CPU_group_4918ead13edfb4cdf5100dd95213288e: 1.000000} for placement, but this node only has remaining {31.000000/32.000000 CPU, 20.301649 GiB/20.301649 GiB memory, 2.000000/2.000000 GPU, 10.150825 GiB/10.150825 GiB object_store_memory, 0.980000/1.000000 node:192.168.1.212, 1.000000/1.000000 CPU_group_4918ead13edfb4cdf5100dd95213288e, 1.000000/1.000000 CPU_group_0_4918ead13edfb4cdf5100dd95213288e}
. In total there are 0 pending tasks and 1 pending actors on this node. This is likely due to all cluster resources being claimed by actors. To resolve the issue, consider creating fewer actors or increase the resources available to this Ray cluster. You can ignore this message if this Ray cluster is expected to auto-scale.
(pid=12012) 2021-05-08 15:53:06,843     WARNING backend_state.py:276 -- Replica 'Counter#TMDTrR' for backend 'Counter' has taken more than 30s to start up. This may be caused by waiting for the cluster to auto-scale or because the backend constructor is slow. Resources required: {'CPU': 1}, resources available: {'CPU': 31.0}.
(pid=12012) 2021-05-08 15:53:36,903     WARNING backend_state.py:276 -- Replica 'Counter#TMDTrR' for backend 'Counter' has taken more than 60s to start up. This may be caused by waiting for the cluster to auto-scale or because the backend constructor is slow. Resources required: {'CPU': 1}, resources available: {'CPU': 31.0}.

Any suggestions on how to debug this issue? RAY_BACKEND_LOG_LEVEL=debug is not giving me any extra output.

And just to confirm this is Windows related, the latest daily for Linux/Python 3.7 works in Colab.

Seems like [Core] Add "shim process" setup_worker.py that calls "conda activate"… · ray-project/ray@b08b2c5 · GitHub is the culprit, created [serve] Broken by https://github.com/ray-project/ray/commit/b08b2c5103c634c680de31b237b2bfcceb9bc150 on Win · Issue #15703 · ray-project/ray · GitHub

Thanks so much for creating the issue and finding the offending commit! We’ll try to get this fixed very soon and improve our testing to catch things like this in the future. In the very short term, you can try not running Ray Serve in a conda environment, or you can try this patch: serve-conda-win.diff · GitHub. Dependency management using conda will still be broken but everything else should still work.

No worries Archit, I rolled back to pre-breakage CL but thank you for the diff! (I am also super excited about conda support in general :slight_smile:

It looks like CI detected the breakage ([Core] Add "shim process" setup_worker.py that calls "conda activate" for runtime_env (#15361) · ray-project/ray@b08b2c5 · GitHub) but I am not sure if you block on this.

Yeah, we don’t block on this due to there being some known flaky tests. In this case the test that failed was unrelated and the test for runtime_env was reported as passing, but it turns out the Windows CI is not being run in a conda environment and therefore the runtime_env conda tests were actually being skipped.