How severe does this issue affect your experience of using Ray?
- None: Just asking a question out of curiosity
Hi folks!
I’m not sure how to integrate a custom FIFO policy for benchmarking purposes.
I thought of using a custom policy class where I implement in compute_actions()
the FIFO logic to compute my next action. Assuming that this would be the right way, how do I “register” my custom FIFO policy for the Trainer? Do I have to use the "multiagent"
-config key despite my setting is only single agent?
For example,
"multiagent": {
"policies": {"default_policy": PolicySpec(FIFO, obs_space, action_space, {})},
"policy_mapping_fn": lambda agent_id: "default_policy",
"policies_to_train": [],
}
Update:
It seems that "default_policy"
is always reserved for the selected Trainer class (e.g. PPOTrainer, PGTrainer and so on) and is a “policy to train” regardless whether I want or not. If I’m not wrong, then I can only include a heuristic, not-learned policy together with a learned (default) policy and cannot employ a heuristic policy as standalone.
I guess I would have to implement a custom Trainer class for being able to employ a heuristic, not-learned policy only. What do you guys think?