Offline RL; incompatible dimensions

fksvensson · October 11, 2022, 11:29am

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hello!

We are working with offline RL in Ray 2.0.0.

I would like to reproduce this example but using the tune.run() functionality.

If I copy the example exactly, collecting data with PG and running DQN offline with CartPole-v0, everything works smoothly.

If I replace CartPole-v0 with the simulator I want to use, I suddenly get dimensionality issues of the type;

ValueError: Cannot feed value of shape (m, n) for Tensor default_policy/obs:0, which has shape (?, x)

On the other hand, if I use PG for both data collection and offline RL, I do not suffer from these dimensionality issues.

Any idea what I could be doing wrong here? Using default config except for the input and output flags

arturn · October 12, 2022, 8:45am

Please post a repro script!

fksvensson · October 12, 2022, 3:06pm

Sure, but since you do not have access to my simulator you might not be able to reproduce the error!

For data collection

import ray.tune as tune
from ray.rllib.algorithms.dqn.dqn import DQNConfig
from ray.rllib.algorithms.pg.pg import PGConfig

config = PGConfig().to_dict()

config["output"] = "/tmp/cartpole-out"
config["output_max_file_size"] = 5000000
config["env"]= "CartPole-v0" #<- everything works smoothly when i use this, but not with my own gym -env


tune.run(
"PG",
stop={"timesteps_total":4000},
config = config
)

Offline RL:

config = DQNConfig().to_dict()

config["input"] = "/tmp/cartpole-out"
config["explore"] = False
config["env"]= "CartPole-v0"

tune.run(
"DQN", #<- My custom gym env works if i use the same alg in collection and in offline training
config = config)

mannyv · October 12, 2022, 3:18pm

Hi @fksvensson,

What if you make a RandomEnv with the same observation_space and action_space as your custom environment. Does that fail in the same way?

github.com

ray-project/ray/blob/master/rllib/examples/env/random_env.py#L9


      
          import copy
          import gym
          from gym.spaces import Discrete, Tuple
          import numpy as np
          
          
from ray.rllib.examples.env.multi_agent import make_multi_agent
          
          

          
class RandomEnv(gym.Env):
              """A randomly acting environment.
          
          
    Can be instantiated with arbitrary action-, observation-, and reward
              spaces. Observations and rewards are generated by simply sampling from the
              observation/reward spaces. The probability of a `done=True` after each
              action can be configured, as well as the max episode length.
              """
          
          
    def __init__(self, config=None):
                  config = config or {}

Here is an example of how to use it:

github.com

ray-project/ray/blob/42864d711d1eb2013a83670efc284ad22a62b929/rllib/models/tests/test_lstms.py#L22


      
          @classmethod
          def setUpClass(cls) -> None:
              ray.init(num_cpus=5)
          
          
@classmethod
          def tearDownClass(cls) -> None:
              ray.shutdown()
          
          
def test_lstm_w_prev_action_and_prev_reward(self):
              """Tests LSTM prev-a/r input insertions using complex actions."""
              config = {
                  "env": RandomEnv,
                  "env_config": {
                      "action_space": Dict(
                          {
                              "a": Box(-1.0, 1.0, ()),
                              "b": Box(-1.0, 1.0, (2,)),
                              "c": Tuple(
                                  [
                                      Discrete(2),
                                      MultiDiscrete([2, 3]),

fksvensson · October 12, 2022, 4:01pm

Thank you for the advice!

I tried a simple version just using

config["env"] = RandomEnv
config["env_config"] = {"action_space": env.action_space, "observation_space": env.observation_space}

And I still get the same error unfortunatley

I could play around with some options for dummy environments, but even if I would work that out I would not be able to use any online evaulation of my offline learning…

The error gets a slightly different look when I switch to torch as a framework, maybe that can give a clue

RuntimeError: mat1 and mat2 shapes cannot be multiplied (32x2 and 96x256)

mannyv · October 12, 2022, 4:15pm

@fksvensson,

Can you share your observation and action spaces?

fksvensson · October 13, 2022, 8:09am

env.action_space

Discrete(16)

env.observation_space

MultiDiscrete([16 80])

mannyv · October 13, 2022, 4:24pm

@fksvensson,

I think there is possibly a bug but try adding this to your DQN config to get it to working:

config["_disable_preprocessor_api"] = True

arturn · October 17, 2022, 1:27pm

@fksvensson Can you provide the full repro script that gives you your latest error and log together with a GH issue, please?

fksvensson · October 25, 2022, 8:54am

Hello! It slipped my mind to answer this, but config["_disable_preprocessor_api"] = True did in fact solve my bug

Thank you @mannyv

Topic		Replies	Views
Offline RL evaluation Configure Algorithm, Training, Evaluation, Scaling	1	388	April 17, 2023
ValueError in simple Tuner/Pytorch prototype RLlib	4	2663	October 12, 2022
Offline data example Offline RL	4	658	April 14, 2023
[rllib]Help!How can cartpole_client.py and cartpole_server.py use tune to set up distributed enviroment? RLlib	1	195	December 13, 2020
"Working with offlien data" tutorial: .read_parquet loads parquet with observations as strings Offline RL	0	13	February 23, 2025

Offline RL; incompatible dimensions

Related topics