I’m looking at using ray as a possible solution to a massive HPC problem, and would welcome feedback and experiences.
My problem involves fragmenting molecules, generating parent child relationships. For each molecule I need to calculate its children (this is relatively fast) and store that information so that it can be output at the end of the process. Each child also needs to be processed, but it might have already been processed so we first need to look for that. The molecules can be easily split into a large number of shards. Conceptually ray looks very suited for this, maybe with one actor holding the child info for each molecule, and one actor for each shard holding the object references to the fragmented molecules.
I do have access to significant computing power (1000 cpu cores, several TB RAM distributed across multiple servers) but, I’m nervous about how this would perform at scale. I will end up with 10’s of billions of distinct molecules and 100’s of billions of total molecules (including duplicates).
Does anyone have experience operating at this sort of scale?