What is the difference between `log_action` and `get_action` and when to use them?

aviskarkc10 · July 27, 2021, 7:16am

What is the difference between the two functions log_action and get_action, and when should you use them?

Also how do you calculate the value of reward to pass to the log_returns function if you are using log_action since it doesn’t return the action that the RL Agent took?

kai · July 28, 2021, 9:02am

Hi @aviskarkc10,

these two functions are used in external environments, e.g. environments that pull actions from the agent. Because of this, it is important that the environment can provide the agent with information about actions that have been taken that not necessarily come from the agent. Say you have a control environment where you have to take an action every 100ms - if action inference takes longer, the environment might make several steps without querying the agent. In that case you would still want to inform the agent about the actions you have taken.

log_action is used to record these off-policy actions. get_action returns (and logs) the current on-policy action to execute in the environment.

The reward comes from the environment - and since you pass the off-policy action to log_action you know it already and usually did an environment step before.

aviskarkc10 · July 28, 2021, 10:29am

Hi @kai,

Thanks for the information.

So this is really a use-case specific question. Consider a RL Agent that learns by imitating the actions of a user/human. So we would:

1. start an episode
2. call `log_action`, log an action based on the action of the user
3. call `log_returns`, log the reward
4. end the episode

Now I am stuck on the third step on how I would calculate the reward to be passed in the log_returns function.

kai · July 28, 2021, 11:14am

The action of the user has to passed into the environment at some point. This usually uses a custom step function that calculates the reward.

If you can share parts of your code (specifically the control code and the environment), we might be able to help you better understand this.

aviskarkc10 · July 28, 2021, 11:43am

So out log_action function would call the step function that is in our Agent definition?

Here is what I have right now:

# this is the server file
# Define the RL Agent

class RLAgent(gym.env):
   def __init__(**kwargs):
       # initialize the agent

So based on what you are saying, I would add the step function also in the Agent definition that would calculate the reward and our log_action function calls this step function?

mannyv · July 28, 2021, 1:02pm

Hi @aviskarkc10,

Check out this example. It should have most of the components you need. Let us know if you need more information.

github.com

ray-project/ray/blob/master/rllib/examples/serving/cartpole_client.py

#!/usr/bin/env python
"""
Example of running an external simulator (a simple CartPole env
in this case) against an RLlib policy server listening on one or more
HTTP-speaking port(s). See `cartpole_server.py` in this same directory for
how to start this server.

This script will only create one single env altogether to illustrate
that RLlib can run w/o needing an internalized environment.

Setup:
1) Start the policy server:
    See `cartpole_server.py` on how to do this.
2) Run this client:
    $ python cartpole_client.py --inference-mode=local|remote --[other options]
      Use --help for help.

In "local" inference-mode, the action computations are performed
inside the PolicyClient used in this script w/o sending an HTTP request
to the server. This reduces network communication overhead, but requires

This file has been truncated. show original

With this approach you will also need to run a server.

github.com

ray-project/ray/blob/master/rllib/examples/serving/cartpole_server.py

#!/usr/bin/env python
"""
Example of running an RLlib policy server, allowing connections from
external environment running clients. The server listens on
(a simple CartPole env
in this case) against an RLlib policy server listening on one or more
HTTP-speaking ports. See `cartpole_client.py` in this same directory for how
to start any number of clients (after this server has been started).

This script will not create any actual env to illustrate that RLlib can
run w/o needing an internalized environment.

Setup:
1) Start this server:
    $ python cartpole_server.py --num-workers --[other options]
      Use --help for help.
2) Run n policy clients:
    See `cartpole_client.py` on how to do this.

The `num-workers` setting will allow you to distribute the incoming feed over n

This file has been truncated. show original

mannyv · July 28, 2021, 1:05pm

@aviskarkc10,

An alternative approach that might be closer to what you want to do is in the documentation here:

https://docs.ray.io/en/master/rllib-offline.html?highlight=offline#example-converting-external-experiences-to-batch-format

aviskarkc10 · July 28, 2021, 1:23pm

Thanks. I finally got the entire picture

klausk55 · July 28, 2021, 1:59pm

@kai, @mannyv, @sven1977 and so on:
Please correct me if I’m wrong, but log_action does also do a call to the NN, I guess to calculate V(s) for the sent observation belonging to the logged off-policy action.
Also, the logged off-policy action will be stored in a sample batch for later training, right?

mannyv · July 31, 2021, 11:50am

Hi @klausk55

log_action is part of the ExternalEnv / policySrrver/Client API and it is intended for offline collection. It does not call the NN. Get action does though.

@sven1977,

The policy client log _action calls update_local_policy but I don’t think it actually uses it. I think it can be removed.

github.com

ray-project/ray/blob/a7f8dc9d77184e63f928233cdcd547d0834f8cdd/rllib/env/policy_client.py#L136-L156

    
      
          def log_action(self, episode_id: str,
                         observation: Union[EnvObsType, MultiAgentDict],
                         action: Union[EnvActionType, MultiAgentDict]) -> None:
              """Record an observation and (off-policy) action taken.
          
          
    Args:
                  episode_id (str): Episode id returned from start_episode().
                  observation (obj): Current environment observation.
                  action (obj): Action for the observation.
              """
          
          
    if self.local:
                  self._update_local_policy()
                  return self.env.log_action(episode_id, observation, action)
          
          
    self._send({
                  "command": PolicyClient.LOG_ACTION,
                  "observation": observation,
                  "action": action,
                  "episode_id": episode_id,
              })

sven1977 · August 2, 2021, 3:12pm

Hey @mannyv , thanks for digging into this. I think we should leave this call to _update_local_policy() (only done for inference-mode=“local”) inside log_action.
It just makes sure that we stick to the update interval, with which the local policy is synched from the server. Could be that we have not called any of “log_action” or “get_action” in a long time and want to make sure we have the last version of weights. Maybe the policy was further learnt on the server from other clients’ data and maybe we would want to do something with the weights, even w/o calling log_action/get_action.

klausk55 · August 4, 2021, 11:31am

Hey @mannyv,

Sorry mate, but I’m almost sure that also log_action does a forward call to the NN.
Inside the function _env_runner in sampler.py we first poll data from the env and log_action puts data (obs and off-policy action) into the queue. Some lines of code later, I guess in _do_policy_eval, there should happen a forward call to the NN. I tried and tested it with some simple printouts to the console (e.g. a printout in function forward of the model).

@mannyv and @sven1977 I’m still not quite sure how such an ‘offline/off-policy-collected sample’ from log_action is processed by RLlib?
Suppose in the case of PPO algorithm, is such a sample also stored in a batch and treated like any other default on-policy-collected sample? (Thinking of a case like learning from demonstrations.)

Thanks in advance!

mannyv · August 4, 2021, 9:03pm

Hi @klausk55,

I was talking about the client side (PolicyClient). I do not think that calling log_actions on the client side causes it to interact with the neural network. The policy client does have a RolloutWorker within it so for example if the on_sample_end made a call to the NN then so would the PolicyClient

Looking at the code it does not look like the receiving side (PolicyServer) uses the neural network when it receives a log_action message.

If the PolicyServer was feeding data to a Trainer, for example PPO as you mention then yeah during on_sample_end (when it computes the advantage / GAE) then it would interact with the NN and of course during learn_on_batch it would also interact with the neural network and update its weights. If you added a callback that interacted with the NN between environment steps then yes that would also occur but I think only on the server side not the client side.

WRT your PPO question, from the perspective of the Trainir this is just another input source so it would treat it the same way as it would treat data coming from a local environment.

klausk55 · August 5, 2021, 8:14am

Hey @mannyv,

I guess we are both right since IMO it depends on the framework setting.
With framework "tf" there won’t happen a forward call to the model, whereas with framework "tf2" (eager) there is a forward call to the model in the RolloutWorker on client side (inference_mode="local").
You might check out this slightly modified cartpole_server/client example and play around with framework settings.

Topic		Replies	Views
Best practice for using `get_action` and `log_action` together? RLlib	1	205	August 19, 2021
How do you get action probabilities from a policy? RLlib	8	1646	September 22, 2022
Best practice for training on policy and off policy action together? RLlib	4	342	September 27, 2021
How to make a function which is to record the average action of each episode RLlib	4	242	September 14, 2021
Log multi agent rewards from policy_client RLlib	1	360	April 7, 2022

What is the difference between `log_action` and `get_action` and when to use them?

Related topics