Confused with coreworker and worker

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity
  • Low: It annoys or frustrates me for a moment.
  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

In the paper, One Ray node may contain a driver, workers, or actors. I run a program as fellowing and check the log files.

from pickletools import read_uint1
import click
import ray
import time
import numpy as np

sleep_time = 30
cpus_per_task = 1

def test_max_running_tasks(num_tasks):

    @ray.remote(num_cpus=cpus_per_task)
    def task(a,b):
        r=np.multiply(a,b)
        time.sleep(sleep_time)
        return r

    refs = [task.remote(np.random.normal((1000,1000)),np.random.normal((1000,1000)) ) for _ in range(num_tasks)]

    while len(refs):
        done,refs=ray.wait(refs)
        time.sleep(1)

    x=ray.get(done)




@click.command()
@click.option("--num-tasks", required=True, type=int, help="Number of tasks to launch.")
@click.option("--num-cpus", required=True, type=int, help="Number of CPUs.")

def test(num_tasks, smoke_test,num_cpus):
    ray.init(address="auto")
    print(ray.available_resources())
    print(num_cpus)
    while not 'CPU' in ray.available_resources() or int(ray.available_resources()['CPU']) != num_cpus:
        pass
    start_time = time.time()
    test_max_running_tasks(num_tasks)
    end_time = time.time()
    del monitor_actor
    while not 'CPU' in ray.available_resources() or int(ray.available_resources()['CPU']) != num_cpus:
        pass

    rate = num_tasks / (end_time - start_time - sleep_time)
    print(
        f"Success! Started {num_tasks} tasks in {end_time - start_time}s. "
        f"({rate} tasks/s)"
    )

if __name__ == "__main__":
    test()

I run this program in two nodes. The command I run in two nodes is ray start --block --num-cpus=0 --head --node-ip-address=xxxxx --port=6379 and ray start --address='xxxx:6379' --block --num-cpus=32.
Then I check the log files. I get 37 files named starting with ‘python-core-worker’. How this happen?
I want to know when and who will start a core worker.

Hey @xyzyx,

CoreWorker here corresponds to ray core’s C++ part, that encapsulates code runs in c++. The python-core-worker-* log is where the CoreWorker dumps the log.

Worker is more of a logical concept in Ray, which corresponds to a OS process.

I get 37 files named starting with ‘python-core-worker’. How this happen?

If you are on nightly, I think ray list workers are a better way of listing all the workers started, and ray summary tasks would be interesting as well. I am not sure why there are 37 files in your case, there might be new workers started during the process your program runs.

I want to know when and who will start a core worker.

Raylet is responsible starting core workers instances, which are mapped to a logical concept of worker.

I would actually recommend reading https://docs.google.com/document/d/1lAy0Owi-vPz2jEqBSaHNQcy2IBSDEHyXNOQZlGuj93c/preview , Instead of the original ray paper, which no longer corresponds to the current ray project in many aspects.

Thanks!
I have read the architecture document and it is helpful. The date on it is September 2020. Is it out of time?

We will be releasing another white paper in a month with the Ray 2.0 release. Stay tuned for the update!

(AFAIK, the changes are not huge to the core architecture, so the 2020 whitepaper is still a great reference)

1 Like