How to define policies

Robert · November 22, 2022, 10:05pm

Hey, I am new to ray and working on my first project. The idea is to reproduce the following Matlab example:

Basically, a PI-Controller is tuned with RL and the parameters Kp and Ki are part of the policy and shall be extracted after the learning. I am trying to define a custom model which only has two neurons without bias for the policy mapping the (2,) observation space to the (1,) action space.

The documentation regarding custom models was of great help. However, I am struggeling to understand how the forward() and value_function() work and at which point they are called.

In the example above they also used the two neurons policy , a large value network to stabilize the learning process and a TD3 algorithm. Maybe an example is available providing some help?

Thanks in advance

arturn · November 30, 2022, 9:45pm

They are called on several occasions that also depend on your framework.
I’ll explain for torch to keep it simple here:
On every timestep, a RolloutWorker gets an observation and will preprocess it before feeding it into forward(). It will capture the output, calculate an action distribution for an action. On the learner thread, the sampled information will be used to again to call forward(), but also get_q_values() and get_policy_output(). These are all a little special for TD3, because it inherits from DPPG. To get a better understanding, you must understand how DDPG works first. It’s all simpler for, say, an ordinary policy gradient algorithm.

There is no example involving such a classic PI-Controller application. But all neural networks are non-linear function approximations so ultimately they can serve to approximate any PI(D) controller that you throw at them.

Topic		Replies	Views
Can't understand training config Configure Algorithm, Training, Evaluation, Scaling	2	33	July 30, 2024
RLLib: How to use policy learned in tune.run()? RLlib	6	988	September 21, 2023
Self-play modifications via callbacks RLlib	4	499	February 24, 2023
Proper way to implement a custom Algorithm + Policy + Model RLlib	2	957	April 24, 2023
Custom torch model for PPO with discrete actions RLlib	1	168	May 8, 2024

How to define policies

Related topics