Scaling advantage after rollout

Arjun_Narayanan · November 3, 2023, 11:20pm

How severe does this issue affect your experience of using Ray?

Low: It annoys or frustrates me for a moment.

I’m trying to use Ray for a research project with a custom environment. In my previous prototype experiments, I’ve found it useful to scale the advantage function after gathering a rollout. For every state of my environment, I have an estimate of the maximum value that can be obtained from this state. Is there a way to scale the advantage function by this value before the agent trains?

To make things more concrete, the advantage is usually computed as,

A_0 = r_0 + g * r_1 + g^2 r_2 ...

and this advantage is used in algorithms like PPO during optimization. Suppose I know that the maximum return at each time is is M_0, M_1, ... I would like to scale the advantage as A_0/M_0, A_1/M_1, ... before the optimization step. Is there a way to achieve this?

Topic		Replies	Views
Increasing the number of rollout worker doesn´t increase the performance Configure Algorithm, Training, Evaluation, Scaling	0	212	December 24, 2023
Seeking recommendations for implementing Dual Curriculum Design in RLlib RLlib	13	656	April 11, 2023
Multi agent policy optimization in competitive settings RLlib	0	328	April 20, 2023
PPO entropy not decreasing in Ray=1.11.0 as Ray=1.2.0? RLlib	8	1137	January 9, 2023
How to calculate the advantage in the forward method of a custom model based on the PPO algorithm RLlib	1	25	March 24, 2025

Scaling advantage after rollout

Related topics