# How do RAY compute action in DDPG algorithms?

Hi,
I want to know the methods that it computes action.
Where do I look at source code with respect to compute action?
And Can I apply 2 types of exploration on the Actor network’s outputs? Like below the figure.

Hi,

as I understand it DDPG is mostly used for continuous actions, whereas Gaussian Sampling and Epsilon Greedy are methods to select discrete actions.
For discrete actions, you could look at algorithms like DQN.

Instead, DDPG typically adds random noise to its continuous actions for exploration. By default, it uses stateful Ornstein Uhlenbeck noise, but you can configure that in the DDPG config:

``````# === Exploration ===
"exploration_config": {
# DDPG uses OrnsteinUhlenbeck (stateful) noise to be added to NN-output
# actions (after a possible pure random phase of n timesteps).
"type": "OrnsteinUhlenbeckNoise",
# For how many timesteps should we return completely random actions,
# before we start adding (scaled) noise?
"random_timesteps": 1000,
# The OU-base scaling factor to always apply to action-added noise.
"ou_base_scale": 0.1,
# The OU theta param.
"ou_theta": 0.15,
# The OU sigma param.
"ou_sigma": 0.2,
# The initial noise scaling factor.
"initial_scale": 1.0,
# The final noise scaling factor.
"final_scale": 1.0,
# Timesteps over which to anneal scale (from initial to final values).
"scale_timesteps": 10000,
},
``````

If you want to look into the code of DDPG, the implementation is here: ray/rllib/agents/ddpg at master · ray-project/ray · GitHub
I believe the output of the computed actions is here (for TF): ray/ddpg_tf_model.py at master · ray-project/ray · GitHub

1 Like

I know to use the DDPG model with exploration.

The intentions of my question are two.
First, the action space in my MDP model is two types: lane change and acceleration control. Lane change is discrete action and acceleration control is continuous action, so my action space is mixed discrete-continuous action space.
Second, I want to apply OUnoise/Gaussian noise for acceleration control policy(actor output) and apply epsilon greedy for lane change policy. (actor output)
Is it possible?

• I have to use deterministic policy, so I cannot use stochastic policy algorithms like SAC or PPO.

Hi @Xim_Lee,

As you can see at the link below, DDPG does not support discrete action spaces.

Rllib, as far as I know, does not support multiple action distributions for the same policy.

You could think about splitting your environment into a multiagent environment with two policies. One for the continuous actions (DDPG) and one for the discrete ACTIONS (DQN?) .

https://docs.ray.io/en/master/rllib-algorithms.html#available-algorithms-overview

2 Likes