How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hi Team,
I have a problem when using Ray.Serve for my api deployment.
My computer has 8 physical CPU cores and 16 logical processors,
When I assigned 4 replicas by
@serve.deployment(num_replicas=4)
And the dashboard shows that 6 process are working (2 of all are used by ray).
But in request test by postman, I found that Ray can only process 2 requests parallelly. for exam, I send 3 requests at the same time, only 2 processors are working at the same time, another 2 are idle.
When I updated num_replicas to 6, only 3 processors are working, anthors 3 are idle. It seems that I can only use half of the CPUs that I config.
Is there any mistake in my config or code?
PS: I set the OMP_NUM_THREADS to match the number of parallelism,but it doesn’t work.
@serve.deployment(num_replicas=4,ray_actor_options={"num_cpus":1,"num_gpus":0})
# @serve.deployment
@serve.ingress(app)
class CutOptimize:
def __init__(self):
os.environ["OMP_NUM_THREADS"]="4"