Loop equivalent in Ray

what is the equivalent of the following loop in Ray?

Here is an example of my loop:

# by zipped 
df_temp2 = []

for fund_name, country_name in zip(funds.name,funds.country):
    try: 
        temp_prices = investpy.get_fund_historical_data(fund= fund_name,
                                       country   = country_name,
                                       from_date = from_date,
                                       to_date   = to_date)
        temp_prices["fund"] = fund_name 
        df_temp2.append(temp_prices)
    except:
        pass

# collect results and clean     
df_funds2 = pd.concat(df_temp2)

This is what I tried:


@ray.remote
def download_mutual_funds_prices(mutual_fund_name:str, country:str,start_date:str, end_date:str):
    
    """
    This is a wrapper function around `investpy.get_fund_historical_data`
    from the investpy library. 
    See this link: https://investpy.readthedocs.io/ 
    ...
    
    Args:
      fund_name (string): the name of the mutual fund. 
        country (string): The country of the mutual fund 
      start_date(string): Date that prices start. Is a string with this '%Y-%m-%d' date format 
        end_date(string): Date that prices end. Is a string with this '%Y-%m-%d' date format
     
    Returns:
      pandas Dataframe: Returns a pandas dataframe
    """

    # convert to date objects 
    start_date = dt.datetime.strptime(start_date, '%Y-%m-%d').date()
    end_date   = dt.datetime.strptime(end_date, '%Y-%m-%d').date()
    
    # set correct date format for investpy library 
    start_date = start_date.strftime("%d/%m/%Y") 
    end_date = end_date.strftime("%d/%m/%Y") 
    
    # download the data from investpy 
    fund_prices = investpy.get_fund_historical_data(fund= mutual_fund_name,
                                       country   = country,
                                       from_date = start_date,
                                       to_date   = end_date)
    
    # add a column with funds name 
    fund_prices["fund"] = mutual_fund_name 

    
    return fund_prices




 # loop over funds 
    
    ray.init(ignore_reinit_error=True, num_cpus=num_cpus)
    @timebudget
    def looping_function(operation,start_date, end_date, input):
        temp = ray.get([operation.remote(item1, item2, start_date,end_date) for item1,item2 in input])
        time.sleep(5)
        return temp 
    myfunds_data=looping_function(download_mutual_funds_prices,start_date, end_date, fund_zipped)
    ray.shutdown()

But I don’t get exactly the same behaviour

Hi @msh855,

You need to move the ray.get out of looping_function. Also, not sure why you have a sleep there.

You need to concatenate all of the individual dataframes returned by looping_function into one dataframe like you do in your reference implementation.

Thanks mannyv,

I am not quite sure I understood what I need to do exactly. Are you basically saying that I don’t need the looping_function and just do:

myfunds_data=ray.get([download_mutual_funds_prices.remote(item1, item2, start_date,end_date) for item1,item2 in fund_zipped])

Can you please post the code if is not too difficult?

time.sleep() put it there as I was checking something about a connection error and forgot to remove it. So, please ignore.

Hi @msh855,

You understood what I was saying. You do not need the looping function. Your code snippet to retrieve the data is correct as is now. myfunds_data will be a list of dataframes for each fund so the last step is to concatenate them all together like you did before.