How do RAY compute action in DDPG algorithms?

Xim_Lee · July 7, 2021, 11:54am

Hi,
I want to know the methods that it computes action.
Where do I look at source code with respect to compute action?
And Can I apply 2 types of exploration on the Actor network’s outputs? Like below the figure.

stefanbschneider · July 9, 2021, 12:46pm

Hi,

as I understand it DDPG is mostly used for continuous actions, whereas Gaussian Sampling and Epsilon Greedy are methods to select discrete actions.
For discrete actions, you could look at algorithms like DQN.

Instead, DDPG typically adds random noise to its continuous actions for exploration. By default, it uses stateful Ornstein Uhlenbeck noise, but you can configure that in the DDPG config:

# === Exploration ===
    "exploration_config": {
        # DDPG uses OrnsteinUhlenbeck (stateful) noise to be added to NN-output
        # actions (after a possible pure random phase of n timesteps).
        "type": "OrnsteinUhlenbeckNoise",
        # For how many timesteps should we return completely random actions,
        # before we start adding (scaled) noise?
        "random_timesteps": 1000,
        # The OU-base scaling factor to always apply to action-added noise.
        "ou_base_scale": 0.1,
        # The OU theta param.
        "ou_theta": 0.15,
        # The OU sigma param.
        "ou_sigma": 0.2,
        # The initial noise scaling factor.
        "initial_scale": 1.0,
        # The final noise scaling factor.
        "final_scale": 1.0,
        # Timesteps over which to anneal scale (from initial to final values).
        "scale_timesteps": 10000,
    },

If you want to look into the code of DDPG, the implementation is here: ray/rllib/agents/ddpg at master · ray-project/ray · GitHub
I believe the output of the computed actions is here (for TF): ray/ddpg_tf_model.py at master · ray-project/ray · GitHub

Xim_Lee · July 12, 2021, 1:54am

Thanks for the reply @stefanbschneider
I know to use the DDPG model with exploration.

The intentions of my question are two.
First, the action space in my MDP model is two types: lane change and acceleration control. Lane change is discrete action and acceleration control is continuous action, so my action space is mixed discrete-continuous action space.
Second, I want to apply OUnoise/Gaussian noise for acceleration control policy(actor output) and apply epsilon greedy for lane change policy. (actor output)
Is it possible?

I have to use deterministic policy, so I cannot use stochastic policy algorithms like SAC or PPO.

mannyv · July 12, 2021, 2:42am

Hi @Xim_Lee,

As you can see at the link below, DDPG does not support discrete action spaces.

Rllib, as far as I know, does not support multiple action distributions for the same policy.

You could think about splitting your environment into a multiagent environment with two policies. One for the continuous actions (DDPG) and one for the discrete ACTIONS (DQN?) .

https://docs.ray.io/en/master/rllib-algorithms.html#available-algorithms-overview

Xim_Lee · July 12, 2021, 4:53am

Thanks for the advice @mannyv
I will consider the methods that split my environment.

stefanbschneider · July 12, 2021, 7:10am

What if you use DDPG for continuous actions and then just postprocess and discretize the part of the actions that you want to be discrete (e.g., inside your environment before applying them)?

Not sure if this will break the learning somehow, but I think it should work.

Xim_Lee · July 12, 2021, 7:44am

Thanks for the reply @stefanbschneider
I tried that (post-process), but it does not properly train the agent.

Topic		Replies	Views
Changing the sampling mechanism in DQN RLlib	7	444	August 28, 2021
Action space compatibility with algorithms RLlib	1	555	May 23, 2023
Is any multi discrete action example for PPO or other algorithms? RLlib	9	4356	January 29, 2023
Discrete and Continuous actions for each step RLlib	5	664	October 20, 2022
Mixed continuous and discrete actions algorithm using deterministic RLlib	1	319	July 1, 2021

How do RAY compute action in DDPG algorithms?

Related topics