[Core] Reason behind process id is None

Process id shows as None inside some of the methods executed by ray.util.sgd.data.dataset.

Dataset(
    <Parallel Iterator>,
    batch_size=batch_size,
    max_concurrency=1,
    download_func=lambda row: sample_method(row))

For example in the above snippet , inside sample_method , the process id is None when I try to print the following.

print(f"task_id: {ray.get_runtime_context().task_id}")

@Alex @sangcho any idea why task_id would be None here?

Task id is not equivalent to the process id. Use os.getpid() to get the process id.

Also, task id is None, if it doesn’t have one (e.g., it is from a driver or actor). I don’t know the internal details about the dataset API, but if it is highly likely the API is called on a driver or actors (Try ray.get_runtime_context().get() to see this).

^ agree with everything Sang said. I’ll add that with dataset/parallel iterators, max_concurrency=1 will use the existing actor, while max_concurrency > 1 is needed to spin up tasks instead.