Efficient use of large dictionary access needed by each worker

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.
  • High: It blocks me to complete my task.

We have the need for each worker in our cluster to have access to a large dictionary. We have tried to do this in two different ways:

  1. load the dictionary in each worker and make use of it in each of the workers processing
  2. start an actor on each node and have each worker access the actor (on the same node as the worker) and do an actor get on the dictionary value that was loaded during each of the actor’s init.

So the downside of 1) is we are using a lot of memory on each box as each worker loads the dictionary. The downside of 2) is in our benchmarking it is showing this to be roughly 1000x slower than 1).

Some cluster details. We have tried this on ray1.2 (our prod cluster) and on 1.11 (latest). No real differences. We have tried also on python 3.8 and python 3.9. We are running this on a linux ray cluster of 20 nodes and launching 50 workers per node in the test. The dictionary in question loaded in memory is roughly 10 GB and the load details are that we are doing a json load from S3 to create the dictionary. We have no need for the dictionary used in the worker process to be mutable if that is relevant. And also as a datapoint, each worker is doing this dictionary value lookup potentially millions of times.

It seems that the dictionary access is not accessing shared memory on each box or the way it is doing it involves a lot of overhead? Any insights would be appreciated.

Any ideas?

Thanks,
Luke

@virtualluke is your dictionary immutable? If there is a frozen dict that could mapped to a continuous chunk of memory we can get it working. I’m not aware of any of such frozen dict data structure at the moment but it doesn’t seem hard to implement one.

If it’s mutable it’s much challenging.

My dictionary needs only read access in the workers so it could be immutable in this case. I am not aware of any standard frozen dictionary but if there were one I would use it.

thanks,
Luke

wrapping something like this in python sounds like a good way forward perhaps?

One approach here is you could use a sorted Pyarrow table or numpy array, and use binary search (e.g., bisect module) to do the lookups. This would work fine on a zero-copy mapping of the underlying table, and I think binary search should give you pretty competitive performance with a hash table.

In principle this is also possible for a frozen hash table, but we don’t have any zero-copy support for those data structures out of the box. You could manually rig one up by hashing values into an existing zero-copy structure like a Pyarrow table / np array though.

Storing the dict in an lmdb database might be an option. Basically, create the database from the dict, store it in S3, then rsync it to each node in your cluster.

Once it’s local, the cost of retrieving a key, value pair from an lmdb database is microseconds.

As quoted above the dictionary lookup in an actor was way too slow. Is the actor overhead so much more than lmdb (for example) so that a dictionary lookup per actor is slower than lmdb? This is an option we may explore.

Having a zero-copy immutable hashmap would be better. For some time it has struck me how odd that there isn’t a numpy (or arrow or similar) hashmap serving the entire python community.

Thanks for the ideas/thoughts.