Here’s my ray cluster set up and use case:
- ray cluster is running on K8s
- This cluster has a remote python function called ‘calc’
- In a running pod, I run a Java app invoking ‘calc’ 100 times and waiting for their completion.
- The code snippet in Java is like below:
fun rpcCollect(jobIds: List<String>) {
Ray.init()
val tasks = batches.map {
....
Ray.task(PyFunction.of("pricer", "calc", String::class.java), data).remote()
}
....
val results = Ray.get(tasks).map {...) }
...
}
- The jvm args for Ray.init is something like this:
export JAVA_OPTS=’-Dray.address=ray_head:6379 -Dray.job.code-search-path=…’, in which ray_head is the ip of the head node of my cluster.
When I run my java app, what I observed from Ray Dashboard is that all tasks are scheduled to one node, which my java app runs on. However, instead of calling ‘calc’ from Java, if I invoke it from python, tasks are sent to different nodes as expected.
Any suggestions?