Offline RL; incompatible dimensions

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hello!

We are working with offline RL in Ray 2.0.0.

I would like to reproduce this example but using the tune.run() functionality.

If I copy the example exactly, collecting data with PG and running DQN offline with CartPole-v0, everything works smoothly.

If I replace CartPole-v0 with the simulator I want to use, I suddenly get dimensionality issues of the type;

ValueError: Cannot feed value of shape (m, n) for Tensor default_policy/obs:0, which has shape (?, x)

On the other hand, if I use PG for both data collection and offline RL, I do not suffer from these dimensionality issues.

Any idea what I could be doing wrong here? Using default config except for the input and output flags

Please post a repro script! :slight_smile:

Sure, but since you do not have access to my simulator you might not be able to reproduce the error!

For data collection

import ray.tune as tune
from ray.rllib.algorithms.dqn.dqn import DQNConfig
from ray.rllib.algorithms.pg.pg import PGConfig

config = PGConfig().to_dict()

config["output"] = "/tmp/cartpole-out"
config["output_max_file_size"] = 5000000
config["env"]= "CartPole-v0" #<- everything works smoothly when i use this, but not with my own gym -env


tune.run(
"PG",
stop={"timesteps_total":4000},
config = config
)

Offline RL:

config = DQNConfig().to_dict()

config["input"] = "/tmp/cartpole-out"
config["explore"] = False
config["env"]= "CartPole-v0"

tune.run(
"DQN", #<- My custom gym env works if i use the same alg in collection and in offline training
config = config)

Hi @fksvensson,

What if you make a RandomEnv with the same observation_space and action_space as your custom environment. Does that fail in the same way?

Here is an example of how to use it:

Thank you for the advice!

I tried a simple version just using

config["env"] = RandomEnv
config["env_config"] = {"action_space": env.action_space, "observation_space": env.observation_space}

And I still get the same error unfortunatley :frowning:

I could play around with some options for dummy environments, but even if I would work that out I would not be able to use any online evaulation of my offline learning…

The error gets a slightly different look when I switch to torch as a framework, maybe that can give a clue

RuntimeError: mat1 and mat2 shapes cannot be multiplied (32x2 and 96x256)

@fksvensson,

Can you share your observation and action spaces?

env.action_space

Discrete(16)

env.observation_space

MultiDiscrete([16 80])

@fksvensson,

I think there is possibly a bug but try adding this to your DQN config to get it to working:

config["_disable_preprocessor_api"] = True

@fksvensson Can you provide the full repro script that gives you your latest error and log together with a GH issue, please?

Hello! It slipped my mind to answer this, but config["_disable_preprocessor_api"] = True did in fact solve my bug

Thank you @mannyv

1 Like