Getting the policy network on_trial_result

I am running hyperparameter-tunning for really long runs and I would like to evaluate each trial best checkpoint on a custom testbench. Currently, I am waiting for the end of the hyperparameter sweep and only then I can get the best policy by episode_reward_mean. I would like to do this for each trial.

analysis = tune.run(
        PPOTrainer,
...
)

config = analysis.best_config
    config["explore"] = False
    agent = PPOTrainer(
        env="compiler_gym",
        config=config
    )
    agent.restore(analysis.best_checkpoint)
    policy = agent.get_policy().model
    eval_policy(policy)

This is what I am trying to do.

class MyCallback(Callback):
    def on_trial_result(self, iteration, trials, trial, result, **info):
        policy = trial.checkpoint...???? <<<<<<<<<<<<< How to extract policy network from trial?
        eval_policy(policy)

analysis = tune.run(
        PPOTrainer,
        ...,
            callbacks=[ MyCallback() ]
)

Thanks!

Extracting the policy network itself is not easy. We have recognized that, when serving a policy, one would usually only want to extract a network with very light-weight additional code to determine an action from the network outputs. We are working on a solution and the efforts are taking form for example in the connectors API, which is included in our nightly builds / docs.

Your weights can be accessed similarly to the following: algorithm.get_policy(DEFAULT_POLICY_ID).get_state()["weights"]. You can of course save them in an extra callback and do whatever you want to them :slight_smile:

1 Like

How to import algorithm?

For example:

from ray.rllib.algorithms.dqn import DQN

In this case, DQN will be the algorithm object. It inherits from Algorithm and ultimately from tune.Trainable.

1 Like