Passing 'custom_action_dist'

Username1 · April 6, 2023, 5:21pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

I am on Ray 2.0 and training PPO with a Dirichlet action space.

I am training my model like this:

 tuner = tune.Tuner("PPO", param_space=config,
                                        run_config=air.RunConfig(
                                        name =  experiment_name,
                                )
                     )
    results = tuner.fit()

Where does the “custom_action_dict” parameter goes now? Since the new config dict has changed from the old examples on the website.

To give more background, these are all the steps I performed:

Import the “Simplex” action space from RLLIB and use it in the init on self_action_space

from ray.rllib.utils.spaces.simplex import Simplex
Import the Dirichlet action space from RLLIB:
from ray.rllib.models.torch.torch_action_dist import TorchDirichlet as Dirichlet
Register the new action space:
from ray.rllib.models import ModelCatalog
ModelCatalog.register_custom_action_dist("Dirichlet", Dirichlet)
Pass the “custom_action_dict” to the trainer.
This is the part that I don’t know how to do (when using Tune to train) since the config dict has changed on Ray 2.0 from the examples on the website.

mannyv · April 6, 2023, 6:24pm

github.com

ray-project/ray/blob/b4a64be501db38e4c28093231aa9b7939a8f06fd/rllib/algorithms/algorithm_config.py#L1589


      
              if worker_restore_timeout_s != DEPRECATED_VALUE:
                  deprecation_warning(
                      old="AlgorithmConfig.rollouts(worker_restore_timeout_s=..)",
                      new="AlgorithmConfig.fault_tolerance(worker_restore_timeout_s=..)",
                      error=False,
                  )
                  self.worker_restore_timeout_s = worker_restore_timeout_s
          
          
    return self
          
          
def training(
              self,
              gamma: Optional[float] = NotProvided,
              lr: Optional[float] = NotProvided,
              train_batch_size: Optional[int] = NotProvided,
              model: Optional[dict] = NotProvided,
              optimizer: Optional[dict] = NotProvided,
              max_requests_in_flight_per_sampler_worker: Optional[int] = NotProvided,
              _enable_learner_api: Optional[bool] = NotProvided,
              learner_class: Optional[Type["Learner"]] = NotProvided,
          ) -> "AlgorithmConfig":

Username1 · April 9, 2023, 2:08pm

Hello @mannyv . Thank you very much for your pointer, but I guess there is something else going on. I am getting this error, which is usually a “catch all” (or “red herring”) for some other error somewhere else:

AttributeError: 'PPO' object has no attribute '_warmup_time'

The error above is missleading, as I believe this is the issue going on (see below). RLLIB is trying to calculate the KL divergence and is calling the Dirichlet Class for it. I am not sure whether I am doing the steps correctly and importing the right things

 File "/usr/local/lib/python3.9/dist-packages/ray/rllib/models/torch/torch_action_dist.py", line 643, in kl
    return self.dist.kl_divergence(other.dist)
AttributeError: 'Dirichlet' object has no attribute 'kl_divergence'

I see on the official implementation here of the Dirichlet Class that the existing method is called “kl” and not “kl_divergence”

To me, in the official code here this line is missing:

 def kl(self, other):
        return torch.distributions.kl.kl_divergence(self.dist, other.dist)

I’ve created a minimal example of the error here:

github.com

lcipolina/Ray_tutorials/blob/main/MARL-custom_action.ipynb

{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {

This file has been truncated. show original

Username1 · April 10, 2023, 3:45pm

To me, this is a bug. Either the KL-divergence is not correct, and should be amended as I propose. Or the option I am using now in my code is to just delete the KL method and have it retrieved from the parent class.

kourosh · April 10, 2023, 5:44pm

Hey @Username1, You are right. Thanks for bringing up the bug. I have just made a PR to fix this issue. Torch.Dirchelet is not something we have good test coverage for.

The fix basically inherits the default kl computation logic from parent which is indeed what you suggested.

github.com/ray-project/ray

[RLlib] Fixed a bug with kl divergence calculation of torch.Dirichlet distribution within RLlib

ray-project:master ← kouroshHakha:fix-drichlet-dist-kl

opened 05:39PM - 10 Apr 23 UTC

kouroshHakha

+1 -5

## Why are these changes needed? ## Related issue number ## Checks… - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(

Username1 · April 10, 2023, 5:49pm

Thank you angel for coming to my rescue! I’ve been scratching my head for a week! Cheers and case closed!

Topic		Replies	Views
Questions and Confusion: Getting started with RLlib Configure Algorithm, Training, Evaluation, Scaling	0	46	February 19, 2025
Converstion to Ray 2.0 RLlib	1	290	October 25, 2022
How to choose the action dist for a custom model with a Tuple action space? RLlib	5	843	May 15, 2022
Custom Action Masking model to Ray.tune and Trials not stopping RLlib	1	415	February 9, 2023
Undestanding the expected output shapes of a Recurrent model with Dict Action Space Configure Algorithm, Training, Evaluation, Scaling	2	292	January 15, 2024

Passing 'custom_action_dist'

Related topics