Not able to locate rllib train function code

Archana_R · February 3, 2023, 11:24am

Below is the snapshot of my code ( I made a custom Gym Environment )

from ray.rllib.agents.ppo import PPOTrainer, DEFAULT_CONFIG
from ray.tune.logger import pretty_print

agent = PPOTrainer(config, env=“fss-v1”)

for _ in range(1):
print(“Entered _ :”,_)
result = agent.train()

print(pretty_print(result))
ray.shutdown()

Since my gym environment is custom , i would like to make a few changes in how Ray selects the actions ( currently i am guessing it is using the sample() function ) . To do so, I am not able to find the location of the train function that is connected to the Gym environment and calls the action and step function.

Can anyone please help me ?

mannyv · February 3, 2023, 2:30pm

Hi @Archana_R,
It would probably be easier to ask what you want to change. Rllib is a large library with lots of abstraction. Looking at the train function is not likely to be useful. RLlib has lots of configuration and callback hooks that can be used to customize most aspects of the process.

Whether the action is a stochastic or deterministic sample depends on the configuration option “explore”.

github.com

ray-project/ray/blob/b31343a8afcafef0fbcf7e81f102aa947870265f/rllib/algorithms/algorithm_config.py#L1535


      
                      "returns a subclass of DefaultCallbacks, got "
                      f"{callbacks_class}!"
                  )
              self.callbacks_class = callbacks_class
          
          
    return self
          
          
def exploration(
              self,
              *,
              explore: Optional[bool] = NotProvided,
              exploration_config: Optional[dict] = NotProvided,
          ) -> "AlgorithmConfig":
              """Sets the config's exploration settings.
          
          
    Args:
                  explore: Default exploration behavior, iff `explore=None` is passed into
                      compute_action(s). Set to False for no exploration behavior (e.g.,
                      for evaluation).
                  exploration_config: A dict specifying the Exploration object's config.

Archana_R · February 3, 2023, 2:54pm

Basically, i have 144 actions ( Multi discrete 12 , 12 ) and not all of them are legal actions . I would like to early on filter out the non legal actions , so that the agent can access the legal actions and optimise the solution.

My project is on job scheduling and hence my actions are → [ Workstation , Job ] since not all workstations can work with all jobs . Hence based on qualification, the action needs to be filtered out.

I understand that Action masking implies , making this change in the Neural network side . But the agent is keep on selecting Non legal actions

mannyv · February 3, 2023, 3:01pm

Hi @Archana_R,

What kind of algorithm are you using. If you are using DQN then action masking is not straightforward. If you are using a PG algorithm like A2C or PPO then you want to do action masking like this:

github.com

ray-project/ray/blob/master/rllib/examples/models/action_mask_model.py

from gymnasium.spaces import Dict

from ray.rllib.models.tf.fcnet import FullyConnectedNetwork
from ray.rllib.models.tf.tf_modelv2 import TFModelV2
from ray.rllib.models.torch.torch_modelv2 import TorchModelV2
from ray.rllib.models.torch.fcnet import FullyConnectedNetwork as TorchFC
from ray.rllib.utils.framework import try_import_tf, try_import_torch
from ray.rllib.utils.torch_utils import FLOAT_MIN

tf1, tf, tfv = try_import_tf()
torch, nn = try_import_torch()


class ActionMaskModel(TFModelV2):
    """Model that handles simple discrete action masking.

    This assumes the outputs are logits for a single Categorical action dist.
    Getting this to work with a more complex output (e.g., if the action space
    is a tuple of several distributions) is also possible but left as an
    exercise to the reader.

This file has been truncated. show original

As for the action selection that will be a combination of an exploration algorithm which are found here:

and an action distribution which is found here:

github.com

ray-project/ray/blob/master/rllib/models/torch/torch_action_dist.py

import functools
import gymnasium as gym
from math import log
import numpy as np
import tree  # pip install dm_tree
from typing import Optional

from ray.rllib.models.action_dist import ActionDistribution
from ray.rllib.models.torch.torch_modelv2 import TorchModelV2
from ray.rllib.utils.annotations import override, DeveloperAPI, ExperimentalAPI
from ray.rllib.utils.framework import try_import_torch
from ray.rllib.utils.numpy import SMALL_NUMBER, MIN_LOG_NN_OUTPUT, MAX_LOG_NN_OUTPUT
from ray.rllib.utils.spaces.space_utils import get_base_struct_from_space
from ray.rllib.utils.typing import TensorType, List, Union, Tuple, ModelConfigDict

torch, nn = try_import_torch()


@DeveloperAPI
class TorchDistributionWrapper(ActionDistribution):

This file has been truncated. show original

github.com

ray-project/ray/blob/master/rllib/models/tf/tf_action_dist.py

import functools
import gymnasium as gym
from math import log
import numpy as np
import tree  # pip install dm_tree
from typing import Optional

from ray.rllib.models.action_dist import ActionDistribution
from ray.rllib.models.modelv2 import ModelV2
from ray.rllib.utils import MIN_LOG_NN_OUTPUT, MAX_LOG_NN_OUTPUT, SMALL_NUMBER
from ray.rllib.utils.annotations import override, DeveloperAPI, ExperimentalAPI
from ray.rllib.utils.framework import try_import_tf, try_import_tfp
from ray.rllib.utils.spaces.space_utils import get_base_struct_from_space
from ray.rllib.utils.typing import TensorType, List, Union, Tuple, ModelConfigDict

tf1, tf, tfv = try_import_tf()
tfp = try_import_tfp()


@DeveloperAPI

This file has been truncated. show original

Archana_R · February 6, 2023, 11:42am

Thank you for this. But in the files you have shared, these are empty functions. Can you please help me locate where does the action sample gets picked from the environment ?

mannyv · February 6, 2023, 11:58am

What do you mean by they are empty functions? Those are the functions that are called to convert the logits returned by your model into actions that are passed into your environment.

In particular, you probably want to look at the sample or deterministic_sample methods in the Categorical action_distribution.

Archana_R · March 22, 2023, 1:53pm

How do we carry it out for DQN Algorithm ? Do you have any code snips for the same ?

Topic		Replies	Views
Controlling compute_actions during training RLlib	0	370	November 26, 2021
Custom Trainable RLLib for Ray 2.3.0 with Ray tune RLlib	6	937	April 2, 2023
Issue with Running Experiments with Custom Gym Environment RLlib	4	497	June 13, 2022
Custom Gym Environment NaN RLlib	0	286	June 16, 2023
A little help for a novice RLlib	1	416	October 26, 2022

Not able to locate rllib train function code

Related topics