Crr multidimensional continuous action space

Hello,

I wanted to ask if the crr algorithm is supposed to be able to handle multi dimensional continuous action spaces in custom environments. When trying to do so the CRRTorchPolicy class returns a 2-dim actor loss which results in the following error: RLTests/venv/lib/python3.8/site-packages/torch/autograd/init.py", line 88, in _make_grads
raise RuntimeError(“grad can be implicitly created only for scalar outputs”)
RuntimeError: grad can be implicitly created only for scalar outputs

This is a small script to produce the error:

import gymnasium as gym
from ray.rllib.algorithms.crr import CRRConfig
from ray.tune import register_env


class EnvTest(gym.Env):
    def __init__(self):
        self.observation_space = gym.spaces.Discrete(n=1)
        self.action_space = gym.spaces.Box(low=0, high=1, shape=(2,))

    def step(self, action):
        return 0, 0, False, False, {}

    def reset(
            self,
            seed=None,
            return_info=None,
            options=None):
        super().reset()
        return 0, {}


register_env(
    "test",
    lambda x: gym.wrappers.TimeLimit(EnvTest(), max_episode_steps=10)
)

crrConfig = CRRConfig().rollouts(
    num_rollout_workers=0
).resources(num_gpus=0).environment("test")
algo = crrConfig.build()
result = algo.train()
print(result)
gymnasium==0.28.1
ray==2.5.1
torch== 2.0.1

Can we simply average the loss vector?