Task id is not equivalent to the process id. Use os.getpid() to get the process id.
Also, task id is None, if it doesn’t have one (e.g., it is from a driver or actor). I don’t know the internal details about the dataset API, but if it is highly likely the API is called on a driver or actors (Try ray.get_runtime_context().get() to see this).
^ agree with everything Sang said. I’ll add that with dataset/parallel iterators, max_concurrency=1 will use the existing actor, while max_concurrency > 1 is needed to spin up tasks instead.