How do RAY compute action in DDPG algorithms?

I want to know the methods that it computes action.
Where do I look at source code with respect to compute action?
And Can I apply 2 types of exploration on the Actor network’s outputs? Like below the figure.


as I understand it DDPG is mostly used for continuous actions, whereas Gaussian Sampling and Epsilon Greedy are methods to select discrete actions.
For discrete actions, you could look at algorithms like DQN.

Instead, DDPG typically adds random noise to its continuous actions for exploration. By default, it uses stateful Ornstein Uhlenbeck noise, but you can configure that in the DDPG config:

# === Exploration ===
    "exploration_config": {
        # DDPG uses OrnsteinUhlenbeck (stateful) noise to be added to NN-output
        # actions (after a possible pure random phase of n timesteps).
        "type": "OrnsteinUhlenbeckNoise",
        # For how many timesteps should we return completely random actions,
        # before we start adding (scaled) noise?
        "random_timesteps": 1000,
        # The OU-base scaling factor to always apply to action-added noise.
        "ou_base_scale": 0.1,
        # The OU theta param.
        "ou_theta": 0.15,
        # The OU sigma param.
        "ou_sigma": 0.2,
        # The initial noise scaling factor.
        "initial_scale": 1.0,
        # The final noise scaling factor.
        "final_scale": 1.0,
        # Timesteps over which to anneal scale (from initial to final values).
        "scale_timesteps": 10000,

If you want to look into the code of DDPG, the implementation is here: ray/rllib/agents/ddpg at master · ray-project/ray · GitHub
I believe the output of the computed actions is here (for TF): ray/ at master · ray-project/ray · GitHub

1 Like

Thanks for the reply @stefanbschneider
I know to use the DDPG model with exploration.

The intentions of my question are two.
First, the action space in my MDP model is two types: lane change and acceleration control. Lane change is discrete action and acceleration control is continuous action, so my action space is mixed discrete-continuous action space.
Second, I want to apply OUnoise/Gaussian noise for acceleration control policy(actor output) and apply epsilon greedy for lane change policy. (actor output)
Is it possible?

  • I have to use deterministic policy, so I cannot use stochastic policy algorithms like SAC or PPO.

Hi @Xim_Lee,

As you can see at the link below, DDPG does not support discrete action spaces.

Rllib, as far as I know, does not support multiple action distributions for the same policy.

You could think about splitting your environment into a multiagent environment with two policies. One for the continuous actions (DDPG) and one for the discrete ACTIONS (DQN?) .


Thanks for the advice @mannyv
I will consider the methods that split my environment.

What if you use DDPG for continuous actions and then just postprocess and discretize the part of the actions that you want to be discrete (e.g., inside your environment before applying them)?

Not sure if this will break the learning somehow, but I think it should work.

1 Like

Thanks for the reply @stefanbschneider
I tried that (post-process), but it does not properly train the agent.