How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hello, I have a question regarding how Ray handles tasks were the number of requested actors exceeds the number of available CPU cores. I am relatively new to multiprocessing and Ray – so I partly hope that the answer to my question is not to obvious
To give you some background information: I am implementing an Ensemble Kalman Filter algorithm in Python, which uses a simulator instead of a classic state-space model. The simulator is based on a deck which is used to initialize the simulator and control it during its operation (the simulator is third party software that I have not created nor have any influence on how it operates). That is, I need several of these simulators running at the same time with different initial conditions. Apparently, each simulator requires its own deck, but copying the one deck I have access to in order to produce multiple decks is not an option. So, I did some research and came up with multiprocessing as a possible solution. This way, I can use different cores for the different simulator objects, and since each core uses its own, separate storage I should be able to operate multiple, independent simulator objects simultaneously even though there exists only one ‘physical’ version of the deck (this is at least my assumption after spending an afternoon with concurrent processing).
I decided to use Ray’s actors to implement the multiprocessing tasks, which worked better than I had expected. To my surprise, the filtering results seemed reasonable even if the number of required Ray actors exceeded the number of available CPU cores, which brings me to my question: How does Ray handle situations in which the number of required actors exceeds the number of available CPU cores with respect to storage usage, ultimately, the global and locale variable space the different actors have access to? In order to perform the filtering task correctly, each simulator objects needs to maintain its locale variable space during the algorithm’s run time, in particular, the state of its deck. Given the simulation results I assume that this is the case. However, in order to evaluate the filter results correctly I need to be certain that the locale variable space remains the same for each actor. Maybe a similar question had been asked before, however, I couldn’t find any answer yet.
In case this might be helpful a small, representative code example of what I am doing:
Each simulator is controlled and managed by an object which is also an actor.
@ray.remote
class SimManager:
def __init__(self, init_values):
self.some_attributes = some_values
self.simulator = init_simulator(init_values)
def simulation_step()
# Code to perform a single simulation step.
return simulation_result
An object of another class creates the requested number of SimManager objects (determined by n_ens), stores them, and performs a simulation step with all of them if required.
class EnsembleManager:
def __init__(self, n_ens):
self.some_attributes = some_values
ensemble_init_values = create_init_values()
self.sim_manager_list = [SimManager.remote(init_values) for init_values in ensemble_init_values]
def ensemble_simulation_step():
ensemble_results = list()
for sim_manager in self.sim_manager_list:
ensemble_results += [sim_manager.simulation_step.remote()]
ensemble_results = ray.get(ensemble_results)
return ensemble_results
If I consider the simulation time, the SimManager objects don’t seem to operate in parallel, however, this is more of a minor issue at the moment. More importantly, I need to know if each SimManager has in fact its own simulator object with its own deck or if they may get mixed up if n_ens > num_cpus.
Thank’s for your help!