paketto
November 18, 2021, 5:18pm
1
Hi there!
We know that in the case of a Box (continuous action) Action Space, the corresponding Action Distribution is DiagGaussian (probability distribution).
However, I want to use TorchDeterministic (Action Distribution that returns the input values directly).
This is the code:
return (values - self.low) / (self.high - self.low)
@staticmethod
@override(ActionDistribution)
def required_model_output_shape(
action_space: gym.Space,
model_config: ModelConfigDict) -> Union[int, np.ndarray]:
return np.prod(action_space.shape) * 2
class TorchDeterministic(TorchDistributionWrapper):
"""Action distribution that returns the input values directly.
This is similar to DiagGaussian with standard deviation zero (thus only
requiring the "mean" values as NN output).
"""
@override(ActionDistribution)
def deterministic_sample(self) -> TensorType:
return self.inputs
With the proper imports, I copied and pasted the contents of this class into a file named custom_action_dist.py
.
I imported it with:
from custom_action_dist import TorchDeterministic
registered my custom_action_dist with:
ModelCatalog.register_custom_action_dist("my_custom_action_dist", TorchDeterministic)
and in config I specified:
"custom_action_dist": "my_custom_action_dist".
However, I’m getting the following error:
"File "/home/28140/DRL/lib/python3.8/site-packages/ray/rllib/models/torch/torch_action_dist.py", line 38, in logp
return self.dist.log_prob(actions)
AttributeError: 'TorchDeterministic' object has no attribute 'dist'"
It seems that I must specify a probability distribution .
Can somebody tell me which that is?
Thank you and looking forward for your reply!
mannyv
November 18, 2021, 10:13pm
2
Hi @paketto ,
If I am understanding your request correctly, I think you can just set this config option:
config[“explore”] =False
paketto
November 18, 2021, 10:35pm
3
I set the option like you said, and no change, still gives the same error.
mannyv
November 18, 2021, 10:58pm
4
Hi @paketto ,
I meant you would use that with the DiagGaussian exploration. If you did that it would just return the mean value for the distribution provided by the policy.
paketto
November 19, 2021, 10:04am
5
Hi @mannyv !
Thank you for your insightful reply.
Although you might be right, I don’t think this approach fits me.
Here is my situation.
I’m using A2C in a matter of asset (re)allocation.
The Actor outputs the weights (percentages of the portfolio) of the assets that I’m about to hold in the next period.
I want to use (Torch)Deterministic so that my Actor output will be my Action(s), “directly”.
Is it possible?