How to use PPOTorchPolicy.with_updates in Ray 1.9+?

stefanbschneider · March 31, 2022, 8:32am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

I am trying to upgrade from Ray 1.8.0 to Ray 1.9.2 (and later hopefully to the latest version) in our code base.
After the upgrade, I have multiple tests failing because we had been using PPOTorchPolicy.with_updates(...), which seems to be gone in 1.9.2. For example, we used to have

CentralPPOPolicy = PPOTorchPolicy.with_updates(
    name="CentralPPOPolicy",
    postprocess_fn=postprocess_central_ppo,
    loss_fn=central_ppo_surrogate_loss,
    stats_fn=sgd_and_other_stats,
)

In Ray 1.9.2 this fails with AttributeError: type object 'PPOTorchPolicy' has no attribute 'with_updates'. How can I achieve the same behavior for the above with Ray 1.9.2+?

Here is my understanding so far:

In Ray 1.8.0, PPOTorchPolicy called build_policy_class(), which would return a TorchPolicy, which in turn has with_updates(): ray/ppo_torch_policy.py at ray-1.8.0 · ray-project/ray · GitHub
In Ray 1.9.2, PPOTorchPolicy looks completely different: ray/ppo_torch_policy.py at ray-1.9.2 · ray-project/ray · GitHub
- It does not call build_policy_class() anymore and does not have the with_updates()
- The build_policy_class() and with_updates() still exist in Ray 1.9.2: ray/policy_template.py at ray-1.9.2 · ray-project/ray · GitHub
  But unfortunately, do not seem to be usable with PPOTorchPolicy.

I feel like there should be a simple fix/approach for this. At the moment, the upgrade is blocked due to this. Thanks for any help!

Lars_Simon_Zehnder · March 31, 2022, 5:05pm

Hi @stefanbschneider ,

what happens, if you use the static build_policy_class() in 9.1.2 and first build the PPOTorchPolicy and then execute your code?

from ray.rllib.policy.policy_template import build_policy_class

PPOTorchPolicy = build_policy_class("PPOTorchPolicy")
CentralPPOPolicy = PPOTorchPolicy.with_updates(
    name="CentralPPOPolicy",
    postprocess_fn=postprocess_central_ppo,
    loss_fn=central_ppo_surrogate_loss,
    stats_fn=sgd_and_other_stats,
)

arturn · March 31, 2022, 9:43pm

Hey @stefanbschneider ,

I know that we have started to deprecate build_trainer. Using build_policy_class has long been recommended but maybe this will get deprecated, too?

Have you tried simply subclassing the policy and overwriting the functions that you are passing?

Best

stefanbschneider · April 4, 2022, 8:31am

Hi @Lars_Simon_Zehnder and @arturn thanks for your quick support!

As a quick fix, I went with @Lars_Simon_Zehnder suggestion, which seems to work fine for now. I just had to already define some of the functions in the build_policy_class:

PPOTorchPolicy = build_policy_class(
    "PPOTorchPolicy",
    framework="torch",
    loss_fn=central_ppo_surrogate_loss,
    postprocess_fn=postprocess_central_ppo,
    stats_fn=sgd_and_other_stats,
)
CentralPPOPolicy = PPOTorchPolicy.with_updates(name="CentralPPOPolicy")

Thanks again!

stefanbschneider · April 13, 2022, 9:18am

Hi @arturn , I saw that build_torch_policy() is depricated in favor of subclassing directly: ray/torch_policy_template.py at master · ray-project/ray · GitHub

But surprisingly, it’s not deprecated for tensorflow: ray/tf_policy_template.py at master · ray-project/ray · GitHub

Do you know why and what the best practice for building TF Policies is at the moment?

gjoliver · April 13, 2022, 3:27pm

Thanks for the timely question.
You are right, we are deprecating the builder pattern for Trainers and Policies, and in general prefer simple subclassing everywhere.
I will hopefully migrating all the policy classes in the next couple of weeks, including both TF and Torch policies.
You should be able to simply sub-class PPOTorchPolicy for your use case?
Thanks again.

stefanbschneider · April 13, 2022, 3:58pm

Hi @gjoliver , thanks for the quick response and for clarifying the roadmap.

Yes, sub-classing PPOTorchPolicy works perfectly fine. I was now wondering about sub-classing DQNTFPolicy. Can I also subclass that or what’s the best practice here at the moment?

gjoliver · April 13, 2022, 6:01pm

yeah, please do.
shouldn’t happen, but let us know if you run into any problems.
I am gonna hopefully migrate everything very soon.

Topic		Replies	Views
Mismatch between the results of PPO after upgrading to Ray 1.8.0 RLlib	2	330	December 15, 2021
Breaking changes for TorchPolicyV2? RLlib	2	276	September 13, 2022
Ray.rllib.agents.ppo missing RLlib	3	7594	March 27, 2023
How do I call my custom TorchPolicyV2 train with ray.tune? RLlib	1	112	March 21, 2024
Trouble migrating legacy build_trainer() code RLlib	0	74	March 2, 2024

How to use PPOTorchPolicy.with_updates in Ray 1.9+?

Related topics