For box action spaces, are the logits in the form
[mean1, std1, mean2, std2, …]
or
[mean1, mean2, …, std1, std2, …]
?
For box action spaces, are the logits in the form
[mean1, std1, mean2, std2, …]
or
[mean1, mean2, …, std1, std2, …]
?
Hi, can you give a bit more context on what you’re trying to build here and what Ray library you’re using? Thanks
Hi. It seems like it has been some time, so you might have found it but here it is anyways:
it’s the second approach.
e.g.
output = torch.concat((means, stds), dim=-1)
@christina I am taking the torch model files produced from a ray train job, converting them to onnx files, and making a rust based inference program.
Thanks!