How to integrate a custom FIFO policy?

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity

Hi folks!

I’m not sure how to integrate a custom FIFO policy for benchmarking purposes.
I thought of using a custom policy class where I implement in compute_actions() the FIFO logic to compute my next action. Assuming that this would be the right way, how do I “register” my custom FIFO policy for the Trainer? Do I have to use the "multiagent"-config key despite my setting is only single agent?
For example,

"multiagent": {
    "policies": {"default_policy": PolicySpec(FIFO, obs_space, action_space, {})},
    "policy_mapping_fn": lambda agent_id: "default_policy",
    "policies_to_train": [],
}

Update:
It seems that "default_policy" is always reserved for the selected Trainer class (e.g. PPOTrainer, PGTrainer and so on) and is a “policy to train” regardless whether I want or not. If I’m not wrong, then I can only include a heuristic, not-learned policy together with a learned (default) policy and cannot employ a heuristic policy as standalone.
I guess I would have to implement a custom Trainer class for being able to employ a heuristic, not-learned policy only. What do you guys think?

I guess I found the answer with the help of this RLlib example :+1:

from typing import Type

from ray.rllib.agents.trainer import Trainer
from ray.rllib.policy.policy import Policy
from ray.rllib.utils.typing import ModelWeights, TrainerConfigDict


class FIFO(Policy):
    """FIFO policy"""
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.model = None
        self.exploration = self._create_exploration()

    def compute_actions(self,
                        obs_batch,
                        state_batches=None,
                        prev_action_batch=None,
                        prev_reward_batch=None,
                        info_batch=None,
                        episodes=None,
                        **kwargs):
        # TODO: Should return action for transport order according to fifo logic
        return ...

    def learn_on_batch(self, samples):
        # implement your learning code here
        return {}  # return stats

    def get_weights(self) -> ModelWeights:
        """No weights to save."""
        return {}


class FIFOTrainer(Trainer):
    def get_default_policy_class(
        self, config: TrainerConfigDict
    ) -> Type[Policy]:
        # default policy class for this Trainer is FIFO
        return FIFO
1 Like