How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I have implemented my own gym environment and I am now creating my custom NN. My action space and observation space follow below:
h, w, k = 4, 8, 3
action_space = gym.spaces.Discrete(h*w)
observation_space = gym.spaces.Dict({"array1": gym.spaces.Box(low=np.float32(0),
high=np.float32(1),
shape=[h,w],
dtype=np.float32),
"array2": gym.spaces.Box(low=np.zeros([k], dtype=np.float32),
high=np.array([h, w, 1], dtype=np.float32),
dtype=np.float32)})
print(action_space, observation_space["array1"].shape, observation_space["array2"].shape)
Discrete(32) (4, 8) (3,)
Since the action space is Discrete(32), I expected num_outputs to equal 32. However, this variable is equal to 256. Below is a short code that can be used to reproduce this. In this code, I am using the CartPole environment.
import torch as th
import torch.nn as nn
import ray
from ray.rllib.agents import dqn
from ray.rllib.models import ModelCatalog
from ray.rllib.models.torch.torch_modelv2 import TorchModelV2
class CustomTorchModel(TorchModelV2):
def __init__(self, obs_space, action_space, num_outputs, model_config, name):
super().__init__(obs_space, action_space, num_outputs, model_config, name)
print(obs_space.shape, action_space, num_outputs, name)
print()
def forward(self, input_dict, state, seq_lens):
return th.zeros([32,256]), state
ModelCatalog.register_custom_model("my_torch_model", CustomTorchModel)
ray.init(ignore_reinit_error=True)
trainer = dqn.DQNTrainer(env="CartPole-v0",
config={"framework": "torch",
"model": {"custom_model": "my_torch_model",
"custom_model_config": {}}})
(4,) Discrete(2) 256 q_func
(4,) Discrete(2) 256 target_q_func
So the variable num_outputs is again equal to 256. I would like to hear why this is the case and where I can find documentation on this.