Where does the “custom_action_dict” parameter goes now? Since the new config dict has changed from the old examples on the website.
To give more background, these are all the steps I performed:
Import the “Simplex” action space from RLLIB and use it in the init on self_action_space
from ray.rllib.utils.spaces.simplex import Simplex
Import the Dirichlet action space from RLLIB: from ray.rllib.models.torch.torch_action_dist import TorchDirichlet as Dirichlet
Register the new action space: from ray.rllib.models import ModelCatalog ModelCatalog.register_custom_action_dist("Dirichlet", Dirichlet)
Pass the “custom_action_dict” to the trainer. This is the part that I don’t know how to do (when using Tune to train) since the config dict has changed on Ray 2.0 from the examples on the website.
Hello @mannyv . Thank you very much for your pointer, but I guess there is something else going on. I am getting this error, which is usually a “catch all” (or “red herring”) for some other error somewhere else:
AttributeError: 'PPO' object has no attribute '_warmup_time'
The error above is missleading, as I believe this is the issue going on (see below). RLLIB is trying to calculate the KL divergence and is calling the Dirichlet Class for it. I am not sure whether I am doing the steps correctly and importing the right things
File "/usr/local/lib/python3.9/dist-packages/ray/rllib/models/torch/torch_action_dist.py", line 643, in kl
return self.dist.kl_divergence(other.dist)
AttributeError: 'Dirichlet' object has no attribute 'kl_divergence'
I see on the official implementation here of the Dirichlet Class that the existing method is called “kl” and not “kl_divergence”
To me, in the official code here this line is missing:
To me, this is a bug. Either the KL-divergence is not correct, and should be amended as I propose. Or the option I am using now in my code is to just delete the KL method and have it retrieved from the parent class.
Hey @Username1, You are right. Thanks for bringing up the bug. I have just made a PR to fix this issue. Torch.Dirchelet is not something we have good test coverage for.
The fix basically inherits the default kl computation logic from parent which is indeed what you suggested.