Efficient use of large dictionary access needed by each worker

virtualluke · March 24, 2022, 4:27pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.
High: It blocks me to complete my task.

We have the need for each worker in our cluster to have access to a large dictionary. We have tried to do this in two different ways:

load the dictionary in each worker and make use of it in each of the workers processing
start an actor on each node and have each worker access the actor (on the same node as the worker) and do an actor get on the dictionary value that was loaded during each of the actor’s init.

So the downside of 1) is we are using a lot of memory on each box as each worker loads the dictionary. The downside of 2) is in our benchmarking it is showing this to be roughly 1000x slower than 1).

Some cluster details. We have tried this on ray1.2 (our prod cluster) and on 1.11 (latest). No real differences. We have tried also on python 3.8 and python 3.9. We are running this on a linux ray cluster of 20 nodes and launching 50 workers per node in the test. The dictionary in question loaded in memory is roughly 10 GB and the load details are that we are doing a json load from S3 to create the dictionary. We have no need for the dictionary used in the worker process to be mutable if that is relevant. And also as a datapoint, each worker is doing this dictionary value lookup potentially millions of times.

It seems that the dictionary access is not accessing shared memory on each box or the way it is doing it involves a lot of overhead? Any insights would be appreciated.

Any ideas?

Thanks,
Luke

Chen_Shen · March 29, 2022, 5:50pm

@virtualluke is your dictionary immutable? If there is a frozen dict that could mapped to a continuous chunk of memory we can get it working. I’m not aware of any of such frozen dict data structure at the moment but it doesn’t seem hard to implement one.

If it’s mutable it’s much challenging.

virtualluke · March 29, 2022, 6:21pm

My dictionary needs only read access in the workers so it could be immutable in this case. I am not aware of any standard frozen dictionary but if there were one I would use it.

thanks,
Luke

virtualluke · March 29, 2022, 6:23pm

wrapping something like this in python sounds like a good way forward perhaps?

ericl · March 29, 2022, 6:45pm

One approach here is you could use a sorted Pyarrow table or numpy array, and use binary search (e.g., bisect module) to do the lookups. This would work fine on a zero-copy mapping of the underlying table, and I think binary search should give you pretty competitive performance with a hash table.

In principle this is also possible for a frozen hash table, but we don’t have any zero-copy support for those data structures out of the box. You could manually rig one up by hashing values into an existing zero-copy structure like a Pyarrow table / np array though.

hahdawg · March 29, 2022, 7:16pm

Storing the dict in an lmdb database might be an option. Basically, create the database from the dict, store it in S3, then rsync it to each node in your cluster.

Once it’s local, the cost of retrieving a key, value pair from an lmdb database is microseconds.

virtualluke · March 30, 2022, 12:54am

As quoted above the dictionary lookup in an actor was way too slow. Is the actor overhead so much more than lmdb (for example) so that a dictionary lookup per actor is slower than lmdb? This is an option we may explore.

Having a zero-copy immutable hashmap would be better. For some time it has struck me how odd that there isn’t a numpy (or arrow or similar) hashmap serving the entire python community.

Thanks for the ideas/thoughts.

Topic		Replies	Views
Best way to share memory for Ray tasks? Ray Core	5	1664	October 18, 2021
Large Read-Only Items for NLP task Ray Core	5	328	February 2, 2021
Ray Data Performance Issues Ray Data	1	520	January 25, 2022
Too many workers Ray Core	10	558	June 13, 2022
Why is the head dying regularly with OOM while the workers barely have any RAM usage?	3	665	July 5, 2023

Efficient use of large dictionary access needed by each worker

Related topics