Getting the policy network on_trial_result

Dejan_Grubisic · September 1, 2022, 8:00pm

I am running hyperparameter-tunning for really long runs and I would like to evaluate each trial best checkpoint on a custom testbench. Currently, I am waiting for the end of the hyperparameter sweep and only then I can get the best policy by episode_reward_mean. I would like to do this for each trial.

analysis = tune.run(
        PPOTrainer,
...
)

config = analysis.best_config
    config["explore"] = False
    agent = PPOTrainer(
        env="compiler_gym",
        config=config
    )
    agent.restore(analysis.best_checkpoint)
    policy = agent.get_policy().model
    eval_policy(policy)

This is what I am trying to do.

class MyCallback(Callback):
    def on_trial_result(self, iteration, trials, trial, result, **info):
        policy = trial.checkpoint...???? <<<<<<<<<<<<< How to extract policy network from trial?
        eval_policy(policy)

analysis = tune.run(
        PPOTrainer,
        ...,
            callbacks=[ MyCallback() ]
)

Thanks!

arturn · September 4, 2022, 3:07pm

Extracting the policy network itself is not easy. We have recognized that, when serving a policy, one would usually only want to extract a network with very light-weight additional code to determine an action from the network outputs. We are working on a solution and the efforts are taking form for example in the connectors API, which is included in our nightly builds / docs.

Your weights can be accessed similarly to the following: algorithm.get_policy(DEFAULT_POLICY_ID).get_state()["weights"]. You can of course save them in an extra callback and do whatever you want to them

Dejan_Grubisic · September 5, 2022, 5:04pm

How to import algorithm?

arturn · September 5, 2022, 5:27pm

For example:

from ray.rllib.algorithms.dqn import DQN

In this case, DQN will be the algorithm object. It inherits from Algorithm and ultimately from tune.Trainable.

Topic		Replies	Views
Extract and display policy RLlib	3	480	July 26, 2021
How do I evaluate my trained policy after tune.fit() RLlib	1	709	March 30, 2023
How to obtain the best schedule of hyperparameters from Population Based Training? RLlib	0	172	August 24, 2023
Accessing weights of neural network of a policy RLlib	1	340	October 22, 2022
Selecting best checkpoint to keep training in tune Ray Tune	0	401	January 25, 2021

Getting the policy network on_trial_result

Related topics