What am I doing wrong? (PB2) - Reusing same parameters

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi All, new to the Ray library and have made a lot of progress but feel as if i am missing something fundamentally in my implementation of PB2 for training of hyper-parameters for an algorithmic trading strategy (using the Jesse library).

The issue that I am facing is that the same time periods are being evaluated each training iteration, however the hyperparameters are only being perbutated/exploited/explored for some of the trials. This means that for the other trials effectively the same test is being conducted with the same set of hyper-parameters.

Can anyone provide any insight to what i’m doing wrong here? Am I using the wrong tool for the job? Any advice greatly appreciated.

So far this resembles what I have implemented:

def trainable(params):
    step = 1

    # Algorithm expects ints only so preprocess params into int
    params['p1']  = int(params['p1'])
    # ...  repeat for ~10 other hyperparameters

    # Checkpoint Loading...
    if session.get_checkpoint():
        state = session.get_checkpoint().to_dict()
        last_step = state['step'];
        step = last_step + 1;

    # The docs dont make it clear why this while loop is needed, but trials will be terminated otherwise...
    while True:
        sharpes = []
        profits =  []

        # Call the external `backtest` function to evaluate the parameters  periods 
        # The hyperparamers over a range of different time periods (~5), then aggregate the results 
        for index, candle in enumerate(test_candles_refs):
            sharpe = -10 # In case the backtest fails to complete, initialise the results to a low value
            profit = -1000
            result = backtest(ray.get(config_ref), # defines the strategy to use
                                ray.get(routes_ref), # defines the time period of candles and currency pair 
                                [],
                                ray.get(test_candles_refs[index]), # which candles to use for the sample
                                hyperparameters=params) # the hyperparameters to use for the strategy
            try:
                sharpe = result['metrics']['smart_sharpe']
                profit = result['metrics']['net_profit']
            except KeyError:
                pass
            sharpes.append(sharpe)
            profits.append(profit)

        sum_sharpe = 0
        sum_profit = 0

        for sharpe in sharpes:
            sum_sharpe = sum_sharpe + sharpe

        for profit in profits:
            sum_profit = sum_profit + profit

        checkpoint_dict = {}
        checkpoint_dict['step'] = step
        checkpoint_dict['avg_smart_sharpe'] = sum_sharpe/len(sharpes)
        checkpoint_dict['sum_profit'] = sum_profit
        checkpoint_dict['sharpes'] = sharpes
        checkpoint_dict['profits'] = profits

       # Report the performance after it has been tested for each time period. 
        checkpoint = None
        checkpoint = Checkpoint.from_dict(checkpoint_dict)
        checkpoint_dict['done'] = (sum_sharpe/len(sharpes)) > 10
        session.report(checkpoint_dict, checkpoint=checkpoint)
        step += 1

Hi @l-j-g,

the trainable looks good. It would be more interesting to see how you instantiate PB2 and the Tuner. Can you share that part of the code?

Also, which behavior do you expect from PB2?

Generally what happens in PB2 in a nutshell:

  • Say you’re running 8 trials. Then you will sample 8 random hyperparameter configurations.
  • The trials report their results to PB2
  • Every perturbation_interval steps, PB2 will check which trials performed best and which performed worst
  • The worst 25% of trials (so 2 trials) will be terminated. Instead, the trials will exploit the best two trials
  • “Exploit” means they will copy them and start from the latest checkpoint. But it doesn’t make sense to just train the same parameters twice, so they perturb some of the parameters
  • In PB2’s case, BayesOpt is used for that
  • The other trials will continue training with their hyperparameters

Does that help?