Provided tensor has shape (240, 320, 1) and view requirement has shape shape (240, 320, 1).Make sure dimensions match to resolve this warning

Denys_Ashikhin · January 10, 2023, 3:46am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hi,

I am running into a weird issue (full log below) where I am providing the correctly shaped observation, but I get an error? However, the client and server can still keep communicating. And the training loop completes successfully (not saying there is much progress as I am testing things out). The error only shows up once at the top as well.

Essentially, I just want to double check if this error is legit, or a false positive?

Policy server:

from gym import spaces
import ray
from ray.rllib.agents import with_common_config
from ray.rllib.agents.ppo import PPOTrainer
from ray.rllib.env import PolicyServerInput

from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.examples.env.random_env import RandomEnv

import numpy as np
import argparse
from gymnasium.spaces import MultiDiscrete, Box

ppo_config = PPOConfig()

parser = argparse.ArgumentParser(description='Optional app description')
parser.add_argument('-ip', type=str, help='IP of this device')

parser.add_argument('-checkpoint', type=str, help='location of checkpoint to restore from')

args = parser.parse_args()


def _input(ioctx):
    return PolicyServerInput(
        ioctx,
        args.ip,
        55556,
    )


x = 320
y = 240
# coef = 0.5
# x = int(x * coef)
# y = int(y * coef)


# ignored:


# kl_coeff, ->
# vf_loss_coeff used to be 0.01??
# "entropy_coeff": 0.00005,
# "clip_param": 0.1,
ppo_config.gamma = 0.998  # default 0.99
ppo_config.lambda_ = 0.99  # default 1.0???
ppo_config.kl_target = 0.01  # used to use 0.02
ppo_config.rollout_fragment_length = 16
ppo_config.train_batch_size = 2560
ppo_config.sgd_minibatch_size = 128
ppo_config.num_sgd_iter = 1  # default 30???
ppo_config.lr = 3.5e-5  # 5e-5
ppo_config.model = {
    # Share layers for value function. If you set this to True, it's
    # important to tune vf_loss_coeff.
    "vf_share_layers": False,

    "use_lstm": True,
    "max_seq_len": 32,
    "lstm_cell_size": 128,
    "lstm_use_prev_action": True,

    # 'use_attention': True,
    # "max_seq_len": 128,
    # "attention_num_transformer_units": 1,
    # "attention_dim": 1024,
    # "attention_memory_inference": 128,
    # "attention_memory_training": 128,
    # "attention_num_heads": 8,
    # "attention_head_dim": 64,
    # "attention_position_wise_mlp_dim": 512,
    # "attention_use_n_prev_actions": 0,
    # "attention_use_n_prev_rewards": 0,
    # "attention_init_gru_gate_bias": 2.0,

    "conv_filters": [
        # [4, [3, 4], [1, 1]],
        # [16, [6, 8], [3, 3]],
        # [32, [6, 8], [3, 4]],
        # [64, [6, 6], 3],
        # [256, [9, 9], 1],

        # 480 x 640
        # [4, [7, 7], [3, 3]],
        # [16, [5, 5], [3, 3]],
        # [32, [5, 5], [2, 2]],
        # [64, [5, 5], [2, 2]],
        # [256, [5, 5], [3, 5]],

        # 240 X 320
        [16, [5, 5], 3],
        [32, [5, 5], 3],
        [64, [5, 5], 3],
        [128, [3, 3], 2],
        [256, [3, 3], 2],
        [512, [3, 3], 2],
    ],
    "conv_activation": "relu",
    "post_fcnet_hiddens": [512],
    "post_fcnet_activation": "relu"
}
ppo_config.batch_mode = "complete_episodes"
ppo_config.simple_optimizer = True
ppo_config.num_gpus = 0
# ppo_config.input_ = (
#         lambda ioctx: PolicyServerInput(ioctx, args.ip, 55556)
#     )


ppo_config.rollouts(num_rollout_workers=0)

ppo_config.offline_data(input_=_input)

ppo_config.env = None
ppo_config.observation_space = Box(low=0, high=1, shape=(y, x, 1), dtype=np.float32)
ppo_config.action_space = MultiDiscrete(
    [
        2,  # W
        2,  # A
        2,  # S
        2,  # D
        2,  # Space
        2,  # H
        2,  # J
        2,  # K
        2  # L
    ]
)

ppo_config.env_config = {
    "sleep": True,
}
ppo_config.framework_str = 'tf'
ppo_config.log_sys_usage = False
ppo_config.compress_observations = True
ppo_config.shuffle_sequences = False
print(ppo_config.to_dict())
tempyy = ppo_config.to_dict()

ray.init(num_cpus=2, num_gpus=0, log_to_driver=False)

trainer = PPOTrainer

from ray import tune

name = "" + args.checkpoint
print(f"Starting: {name}")

tune.run(trainer,
         resume='AUTO',
         config=ppo_config.to_dict(), name=name, keep_checkpoints_num=None, checkpoint_score_attr="episode_reward_mean",
         max_failures=1,
         # restore="C:\\Users\\denys\\ray_results\\mediumbrawl-attention-256Att-128MLP-L2\\PPOTrainer_RandomEnv_1e882_00000_0_2022-06-02_15-13-44\\checkpoint_000028\\checkpoint-28",
         checkpoint_freq=5, checkpoint_at_end=True)

Policy Client

import os

import cv2
from ray.rllib.env import PolicyClient

from pathlib import Path

from environment import BrawlEnv
import logging
import time
import argparse

logging.basicConfig(level=logging.INFO)

parser = argparse.ArgumentParser(description='Optional app description')
parser.add_argument('-ip', type=str,
                    help='IP of this device')

parser.add_argument('-speed', type=float,
                    help='gameFactor, default 1.0')

parser.add_argument('-update', type=float,
                    help='seconds how often to update from main process')

parser.add_argument('-local', type=str,
                    help='Whether to create and update a local copy of the AI (adds delay) or query server for each action.'
                         'possible values: "local" or "remote"')

args = parser.parse_args()

update = 3600.0

local = 'local'

remoteee = False

if args.update:
    update = args.update
    # remoteee = True

if args.local:
    local = args.local

if local == 'remote':
    remoteee = True

print(f"Going to update {local}-y  at {update} seconds interval")

print('trying to launch policy client')
print(f"http://{args.ip}:55556")

# Setting update_interval to false, so it doesn't update in middle of games, will be manually updating it between games
client = PolicyClient(address=f"http://{args.ip}:55556", update_interval=False, inference_mode=local)
# client = PolicyClient(address=f"http://{args.ip}:55556", update_interval=60, inference_mode=local)


forced = True
root = None

env = BrawlEnv({'sleep': True})

print('trying to get initial eid')
episode_id = client.start_episode()

# if local == 'remote':
#     env.underlord.startNewGame()c

# gameObservation = env.underlord.getObservation()
reward = 0
print('starting main loop')
replayList = []

update = True

runningReward = 0

counter = 0
runningCounter = 0
numLoops = 0

startTime = time.time()
endTime = time.time()

fps = 5
actionTimeOut = 1.0 / fps
print(f"action time: {actionTimeOut}")
actionTime = time.time()

env.restartRound()

x = 320
y = 240

epochActions = 4096
actionsUntilEpoch = 4096
epochNum = 0

needReset = False

numActions = 0
old_id = None

gameTime = time.time()

while True:

    # if needReset:
    #     env.releaseAllKeys()

    if numActions % 500 == 0:
        env.refreshWindow()

    elapsed_time = time.time() - actionTime
    if elapsed_time < actionTimeOut:
        time.sleep(actionTimeOut - elapsed_time)
        # continue

    actionTime = time.time()

    # average out to ~30actions a second
    counter = counter + 1
    runningCounter = runningCounter + 1
    endTime = time.time()
    if (endTime - startTime) > 1:
        print(f"actions per second: {counter}")
        startTime = time.time()
        counter = 0
        numLoops = numLoops + 1

    # timeStart = time.time()
    gameObservation, reward, gameOver = env.getObservation()
    # print(f"Time to get obs: {time.time() - timeStart}")
    # print('got observation')
    # print(gameObservation)
    # print(env.observation_space.contains(gameObservation))
    # print(reward, gameOver)

    # if not env.observation_space.contains(gameObservation):
    #     print(gameObservation)
    #     print("Not lined up 1")
    #     print(env.underlord.heroAlliances)
    #     sys.exit()

    action = None

    # timeStart = time.time()
    action = client.get_action(episode_id=episode_id, observation=gameObservation)
    # print(f"Time to get action: {time.time() - timeStart}")

    if needReset:

        print('starting reset!')

        if local == 'local':
            print("updating weights")
            client.update_policy_weights()
            print('finished updating weights')
        time.sleep(0.25)
        env.refreshWindow()
        time.sleep(0.25)
        # env.releaseAllKeys()
        env.restartRound()
        needReset = False
        reward = 0
        numLoops = 0
        runningCounter = 0
        counter = 0
        gameOver = False

        print('resetFinished!')
    else:
        # timeStart = time.time()
        env.act(action)
        # print(f"Time to act: {time.time() - timeStart}")
        # print('took action')

    # print('got action')

    runningReward += reward
    # act_time = time.time() - act_time
    # print("--- %s seconds to get do action ---" % (time.time() - start_time))
    # print(f"running reward: {reward}")

    client.log_returns(episode_id=episode_id, reward=reward)
    # print('logged returns')
    # Updating the model after every game in case there is a new one

    numActions = numActions + 1

    if gameOver and numActions > 25:

        # if elapsed_time > 20:
        #     print("restarting due to elapsed time")

        env.releaseAllKeys()
        env.resetHP()
        numActions = 0

        if reward <= -1:
            print(f"GAME OVER! WE Lost final reward: {runningReward}! Number of actions: {runningCounter}")
            env.gameLog += f"GAME OVER! WE Lost final reward: {runningReward}! Number of actions: {runningCounter}\\n"

        else:
            print(f"GAME OVER! WE Won final reward: {runningReward}! Number of actions: {runningCounter}")
            env.gameLog += f"GAME OVER! WE Won final reward: {runningReward}! Number of actions: {runningCounter}\n"

        env.gameLog += str(env.rewards)

        if runningReward >= -0.6:

            folderString = f"reward-{round(runningReward, 4)}-{epochNum}-{runningCounter}"

            fullString = os.getcwd() + "/replays/" + folderString

            if reward >= 0.0:
                fullString = os.getcwd() + "/replays/positive/" + folderString
            elif reward >= -0.3:
                fullString = os.getcwd() + "/replays/good/" + folderString
            else:
                fullString = os.getcwd() + "/replays/meh/" + folderString

            Path(fullString).mkdir(parents=True, exist_ok=True)
            f = open(fullString + "/log.txt", "a")
            f.write(env.gameLog)

            # this would be 10 minute long game

            video_fps = ((runningCounter - counter) / numLoops) + (counter / fps)

            if len(env.images) <= 6000:
                fourcc = cv2.VideoWriter_fourcc('M', 'J', 'P', 'G')
                video = cv2.VideoWriter(fullString + '/video.avi', fourcc, video_fps, (x, y), False)

                for img in env.images:
                    # img = img * 255.0
                    video.write(img.astype('uint8'))
                video.release()
            env.images = []
        env.gameLog = ""

        actionsUntilEpoch = actionsUntilEpoch - runningCounter

        if actionsUntilEpoch < 0:
            epochNum = epochNum + 1

        print(f"Actions until epoch: {actionsUntilEpoch}, current epoch: {epochNum}")
        print(env.rewards)
        if actionsUntilEpoch < 0:
            actionsUntilEpoch = epochActions

        runningReward = 0
        runningCounter = 0
        reward = 0
        numLoops = 0
        # need to call a reset of env here
        finalObs, reward, gameOver = env.getObservation()

        old_id = episode_id
        client.end_episode(episode_id=episode_id, observation=finalObs)

        episode_id = client.start_episode(episode_id=None)

        needReset = True
        time.sleep(0.25)

    # print('finished logging step')

    # print("--- %s seconds to get finish logging return ---" % (time.time() - start_time))

    # replayList.append((gameObservation, action, reward))

    # print( f"Round: {gameObservation[5]} - Time Left: {gameObservation[12]} - Obs duration: {obs_time} - Act
    # duration: {act_time} - Overall duration: {time.time() - start_time}")

Error Log

INFO:ray.rllib.evaluation.sampler:Raw obs from env: { 'c14d2a6b5fd645dbb34e18f7278d1f4d': { 'agent0': np.ndarray((240, 320, 1), dtype=float64, min=0.0, max=0.996, mean=0.666)}}
INFO:ray.rllib.evaluation.sampler:Info return from env: {'c14d2a6b5fd645dbb34e18f7278d1f4d': {'agent0': {}}}
INFO:ray.rllib.evaluation.sampler:Preprocessed obs: np.ndarray((240, 320, 1), dtype=float64, min=0.0, max=0.996, mean=0.666)
INFO:ray.rllib.evaluation.sampler:Filtered obs: np.ndarray((240, 320, 1), dtype=float64, min=0.0, max=0.996, mean=0.666)
WARNING:ray.rllib.evaluation.collectors.agent_collector:Provided tensor
[[[0.23529412]
  [0.23137255]
  [0.22745098]
  ...
  [0.21960784]
  [0.22745098]
  [0.23137255]]

 [[0.23529412]
  [0.23137255]
  [0.22352941]
  ...
  [0.21568627]
  [0.22352941]
  [0.22745098]]

 [[0.23137255]
  [0.23137255]
  [0.21960784]
  ...
  [0.21176471]
  [0.21960784]
  [0.22352941]]

 ...

 [[0.23529412]
  [0.23137255]
  [0.22745098]
  ...
  [0.14509804]
  [0.14901961]
  [0.15294118]]

 [[0.23529412]
  [0.23137255]
  [0.22745098]
  ...
  [0.14901961]
  [0.15294118]
  [0.15686275]]

 [[0.23529412]
  [0.23529412]
  [0.23137255]
  ...
  [0.15294118]
  [0.15686275]
  [0.15686275]]]
 does not match space of view requirements obs.
Provided tensor has shape (240, 320, 1) and view requirement has shape shape (240, 320, 1).Make sure dimensions match to resolve this warning.

mannyv · January 10, 2023, 4:58am

Hi @Denys_Ashikhin,

Been a while. Hope all is well.

The data types do not match. In the server you specified float32.

ppo_config.observation_space = Box(low=0, high=1, shape=(y, x, 1), dtype=np.float32)

In the client you are sending float64

INFO:ray.rllib.evaluation.sampler:Preprocessed obs: np.ndarray((240, 320, 1), dtype=float64, min=0.0, max=0.996, mean=0.666)

PREJAN · January 10, 2023, 6:09am

Hi
I get the exact same warning and in my case i’m only using train with no policy client/server:

config = ( 
    PPOConfig()
    .resources(num_gpus=1, num_cpus_per_worker=1, num_gpus_per_worker=0.1) 
    .environment("myEnv", env_config= env_config,disable_env_checking=True) 
    .rollouts( num_rollout_workers=1, batch_mode="complete_episodes",preprocessor_pref=None,observation_filter="NoFilter",compress_observations=False) 
    .framework(framework="tf2", eager_tracing=False)
    .experimental( _disable_preprocessor_api=True)
)
algo = config.build()
result = algo.train()

using ray nightly with gymnasium custom environment, the observation space is consistent in shape and data types (float 32) on all the outputs:

2023-01-10 07:03:16,552 INFO algorithm_config.py:2798 -- Executing eagerly (framework='tf2'), with eager_tracing=tf2. For production workloads, make sure to set eager_tracing=True  in order to match the speed of tf-static-graph (framework='tf'). For debugging purposes, `eager_tracing=False` is the best choice.
(RolloutWorker pid=3800) 2023-01-10 07:03:28,441        INFO eager_tf_policy_v2.py:75 -- Creating TF-eager policy running on GPU.
(RolloutWorker pid=3800) 2023-01-10 07:03:29,078        INFO policy.py:1196 -- Policy (worker=1) running on 0.1 GPUs.
(RolloutWorker pid=3800) 2023-01-10 07:03:29,078        INFO eager_tf_policy_v2.py:94 -- Found 1 visible cuda devices.
2023-01-10 07:03:29,637 INFO worker_set.py:309 -- Inferred observation/action spaces from remote worker (local worker has no env): {'default_policy': (Box(0.0, 1.0, (12, 32), float32), Box(-1.0, 1.0, (3,), float32)), '__env__': (Box(0.0, 1.0, (12, 32), float32), Box(-1.0, 1.0, (3,), float32))}
2023-01-10 07:03:29,682 INFO eager_tf_policy_v2.py:75 -- Creating TF-eager policy running on GPU.
2023-01-10 07:03:30,795 INFO policy.py:1196 -- Policy (worker=local) running on 1 GPUs.
2023-01-10 07:03:30,795 INFO eager_tf_policy_v2.py:94 -- Found 1 visible cuda devices.
2023-01-10 07:03:31,460 INFO rollout_worker.py:2040 -- Built policy map: <PolicyMap lru-caching-capacity=100 policy-IDs=['default_policy']>
2023-01-10 07:03:31,460 INFO rollout_worker.py:2041 -- Built preprocessor map: {'default_policy': None}
2023-01-10 07:03:31,460 INFO rollout_worker.py:757 -- Built filter map: defaultdict(<class 'ray.rllib.utils.filter.NoFilter'>, {'default_policy': <ray.rllib.utils.filter.NoFilter object at 0x0000017DF9ACA4A0>})
2023-01-10 07:03:31,488 INFO algorithm_config.py:2798 -- Executing eagerly (framework='tf2'), with eager_tracing=tf2. For production workloads, make sure to set eager_tracing=True  in order to match the speed of tf-static-graph (framework='tf'). For debugging purposes, `eager_tracing=False` is the best choice.
(RolloutWorker pid=6544) 2023-01-10 07:03:38,690        INFO eager_tf_policy_v2.py:75 -- Creating TF-eager policy running on GPU.
(RolloutWorker pid=6544) 2023-01-10 07:03:39,643        INFO policy.py:1196 -- Policy (worker=1) running on 0.1 GPUs.
(RolloutWorker pid=6544) 2023-01-10 07:03:39,643        INFO eager_tf_policy_v2.py:94 -- Found 1 visible cuda devices.
2023-01-10 07:03:40,168 INFO trainable.py:172 -- Trainable.setup took 23.562 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
(RolloutWorker pid=3800) 2023-01-10 07:03:40,179        INFO rollout_worker.py:905 -- Generating sample batch of size 4000
(RolloutWorker pid=3800) 2023-01-10 07:03:40,638        INFO sampler.py:609 -- Raw obs from env: { 0: { 'agent0': np.ndarray((12, 32), dtype=float32, min=0.0, max=1.0, mean=0.439)}}
(RolloutWorker pid=3800) 2023-01-10 07:03:40,638        INFO sampler.py:610 -- Info return from env: {0: {'agent0': {}}}
(RolloutWorker pid=3800) 2023-01-10 07:03:40,638        INFO sampler.py:857 -- Filtered obs: np.ndarray((12, 32), dtype=float32, min=0.0, max=1.0, mean=0.439)
(RolloutWorker pid=3800) 2023-01-10 07:03:40,645        WARNING agent_collector.py:176 -- Provided tensor
(RolloutWorker pid=3800)  does not match space of view requirements obs.
(RolloutWorker pid=3800) Provided tensor has shape (12, 32) and view requirement has shape shape (12, 32).Make sure dimensions match to resolve this warning.
(RolloutWorker pid=3800) 2023-01-10 07:03:40,646        INFO sampler.py:1143 -- Inputs to compute_actions():
(RolloutWorker pid=3800)
(RolloutWorker pid=3800) { 'default_policy': [ { 'data': { 'agent_id': 'agent0',
(RolloutWorker pid=3800)                                   'env_id': 0,
(RolloutWorker pid=3800)                                   'info': {},
(RolloutWorker pid=3800)                                   'obs': np.ndarray((12, 32), dtype=float32, min=0.0, max=1.0, mean=0.439),
(RolloutWorker pid=3800)                                   'prev_action': None,
(RolloutWorker pid=3800)                                   'prev_reward': 0.0,
(RolloutWorker pid=3800)                                   'rnn_state': None},
(RolloutWorker pid=3800)                         'type': '_PolicyEvalData'}]}
(RolloutWorker pid=3800)
(RolloutWorker pid=3800) 2023-01-10 07:03:40,919        INFO sampler.py:1170 -- Outputs of compute_actions():
(RolloutWorker pid=3800) 
(RolloutWorker pid=3800) { 'default_policy': ( np.ndarray((1, 3), dtype=float32, min=-0.605, max=0.521, mean=-0.005),
(RolloutWorker pid=3800)                       [],
(RolloutWorker pid=3800)                       { 'action_dist_inputs': np.ndarray((1, 6), dtype=float32, min=-0.009, max=0.005, mean=-0.001),
(RolloutWorker pid=3800)                         'action_logp': np.ndarray((1,), dtype=float32, min=-3.063, max=-3.063, mean=-3.063),
(RolloutWorker pid=3800)                         'action_prob': np.ndarray((1,), dtype=float32, min=0.047, max=0.047, mean=0.047),
(RolloutWorker pid=3800)                         'vf_preds': np.ndarray((1,), dtype=float32, min=0.004, max=0.004, mean=0.004)})}
(RolloutWorker pid=3800)
(RolloutWorker pid=3800) 2023-01-10 07:03:40,935        WARNING agent_collector.py:176 -- Provided tensor
(RolloutWorker pid=3800) 0
(RolloutWorker pid=3800)  does not match space of view requirements t.
(RolloutWorker pid=3800) Provided tensor has shape () and view requirement has shape shape ().Make sure dimensions match to resolve this warning.

PREJAN · January 10, 2023, 1:03pm

@mannyv @Denys_Ashikhin I see this error comes from agent_collector.py line 149 where they specify that:

# We only check for the shape here, because conflicting dtypes are often
# because of float conversion

So this warning should not be caused because of a dtype inconsistency as they don’t check it.

I filed in an issue at github, in case you want to upvote : )

github.com/ray-project/ray

[<Ray component: Core|RLlib|etc...>]

opened 12:56PM - 10 Jan 23 UTC

PREJANV

bug triage

### What happened + What you expected to happen Training with PPO and a custom …gymnasium environment I get the following warning ``` (RolloutWorker pid=20060) 2023-01-10 13:42:48,569 WARNING agent_collector.py:176 -- Provided tensor (RolloutWorker pid=20060) [[0.9000493 0.51850504 0.18601838 0.5494233 0.6269164 0.32038948 (RolloutWorker pid=20060) 0.14822215 0.43434328 0.07283064 0.94686174 0.8066404 0.74721897 (RolloutWorker pid=20060) 0.98415834 0.8158784 0.76249206 0.40291464 0.99364555 0.01875414 (RolloutWorker pid=20060) 0.8057871 0.8856465 0.11302706 0.53068304 0.43940282 0.559095 (RolloutWorker pid=20060) 0.01909649 0.8163649 0.21121128 0.09340297 0.02158277 0.23328267 (RolloutWorker pid=20060) 0.7762947 0.25751087] (RolloutWorker pid=20060) [0.39867565 0.12491126 0.3329332 0.44346288 0.5142979 0.6175652 (RolloutWorker pid=20060) 0.05549364 0.7253856 0.94145375 0.30897504 0.11532909 0.30400372 (RolloutWorker pid=20060) 0.34754884 0.0800738 0.11248542 0.14590706 0.9647818 0.16090412 (RolloutWorker pid=20060) 0.49405068 0.72488344 0.18449555 0.8508052 0.8837216 0.10360631 (RolloutWorker pid=20060) 0.35319296 0.517637 0.41329667 0.83690745 0.02681482 0.02176451 (RolloutWorker pid=20060) 0.5245813 0.15003213] (RolloutWorker pid=20060) [0.49588898 0.4043189 0.53903234 0.82392913 0.56884104 0.96559596 (RolloutWorker pid=20060) 0.5063749 0.3184152 0.15492174 0.8650602 0.6674745 0.8688123 (RolloutWorker pid=20060) 0.8423847 0.41679984 0.97719634 0.90876967 0.37868965 0.44538996 (RolloutWorker pid=20060) 0.56593484 0.35303712 0.675954 0.24596709 0.20995769 0.6975155 (RolloutWorker pid=20060) 0.7300234 0.88135594 0.00486146 0.14112177 0.89055794 0.21905392 (RolloutWorker pid=20060) 0.29613894 0.643273 ] (RolloutWorker pid=20060) [0.5624345 0.08943037 0.8348635 0.06275369 0.6516605 0.20468438 (RolloutWorker pid=20060) 0.9963377 0.3271967 0.65439624 0.13647816 0.50885177 0.4720871 (RolloutWorker pid=20060) 0.16108388 0.2807276 0.8823739 0.05939329 0.42974636 0.26164347 (RolloutWorker pid=20060) 0.6101442 0.09450069 0.80815303 0.00772328 0.22234274 0.7437882 (RolloutWorker pid=20060) 0.29537466 0.0800917 0.9079718 0.59242827 0.28089765 0.8780705 (RolloutWorker pid=20060) 0.37703675 0.64289784] (RolloutWorker pid=20060) [0.22821486 0.6075905 0.055758 0.95551026 0.68590456 0.7792671 (RolloutWorker pid=20060) 0.10088159 0.91192424 0.06667452 0.99216235 0.24386576 0.92277294 (RolloutWorker pid=20060) 0.11350299 0.37284544 0.31772232 0.4133361 0.96362656 0.68741393 (RolloutWorker pid=20060) 0.41498634 0.14214647 0.04046503 0.9451105 0.36407602 0.00734347 (RolloutWorker pid=20060) 0.86368704 0.24190353 0.3599716 0.6440221 0.95778126 0.7353202 (RolloutWorker pid=20060) 0.4211907 0.70380825] (RolloutWorker pid=20060) [0.78524137 0.82080245 0.20111698 0.19983332 0.09830222 0.22680594 (RolloutWorker pid=20060) 0.4354765 0.2709179 0.54186803 0.18247972 0.14274581 0.9705024 (RolloutWorker pid=20060) 0.4014478 0.38794357 0.28754273 0.9060136 0.00611391 0.09225608 (RolloutWorker pid=20060) 0.0469722 0.8036502 0.19912748 0.41667366 0.09184187 0.326219 (RolloutWorker pid=20060) 0.5970359 0.31203026 0.08140769 0.33513057 0.27147576 0.10049362 (RolloutWorker pid=20060) 0.23050089 0.43118903] (RolloutWorker pid=20060) [0.3407495 0.56191474 0.3059989 0.8485836 0.57671297 0.6435267 (RolloutWorker pid=20060) 0.09679879 0.35292765 0.44456822 0.9381769 0.23972192 0.25176972 (RolloutWorker pid=20060) 0.85919064 0.3286217 0.27668944 0.83877236 0.5622788 0.09217726 (RolloutWorker pid=20060) 0.995918 0.36975682 0.8236862 0.3744046 0.89717644 0.02610226 (RolloutWorker pid=20060) 0.00700592 0.14896955 0.3893271 0.20863217 0.75036645 0.8797803 (RolloutWorker pid=20060) 0.46118402 0.05230898] (RolloutWorker pid=20060) [0.00934392 0.36082163 0.0672778 0.01273095 0.40317014 0.3788415 (RolloutWorker pid=20060) 0.7258307 0.87123245 0.442526 0.77259505 0.04825561 0.66222596 (RolloutWorker pid=20060) 0.85501534 0.48559728 0.04058405 0.12640485 0.75757027 0.298618 (RolloutWorker pid=20060) 0.45592707 0.18174353 0.676958 0.1024297 0.74711436 0.40185955 (RolloutWorker pid=20060) 0.7368367 0.9741534 0.77711344 0.6163694 0.02339959 0.12964466 (RolloutWorker pid=20060) 0.8120909 0.47815993] (RolloutWorker pid=20060) [0.15709089 0.4334651 0.76086533 0.32759807 0.5071647 0.92820907 (RolloutWorker pid=20060) 0.910815 0.18817385 0.9695161 0.51454645 0.30972332 0.8677486 (RolloutWorker pid=20060) 0.8069189 0.9201284 0.2932427 0.38430214 0.46205238 0.2551235 (RolloutWorker pid=20060) 0.8969418 0.1110287 0.9771151 0.5327908 0.17162564 0.8611157 (RolloutWorker pid=20060) 0.12180417 0.93907005 0.34829205 0.3010941 0.5622266 0.16378088 (RolloutWorker pid=20060) 0.77850306 0.13598312] (RolloutWorker pid=20060) [0.39863792 0.01888104 0.3651312 0.19478671 0.19617456 0.22113882 (RolloutWorker pid=20060) 0.95212686 0.16019018 0.8114557 0.41525355 0.8123079 0.58933395 (RolloutWorker pid=20060) 0.66785127 0.5832939 0.5760921 0.7634342 0.6512377 0.39713442 (RolloutWorker pid=20060) 0.7360036 0.9842419 0.883697 0.5066085 0.6659464 0.42313623 (RolloutWorker pid=20060) 0.34283423 0.02549417 0.5275046 0.7746196 0.90541726 0.7325152 (RolloutWorker pid=20060) 0.30253437 0.6892537 ] (RolloutWorker pid=20060) [0.04019118 0.13645232 0.01762006 0.45056042 0.8774873 0.03102558 (RolloutWorker pid=20060) 0.32390437 0.6751764 0.15018809 0.7935553 0.8748984 0.12882037 (RolloutWorker pid=20060) 0.23764473 0.8798575 0.52937096 0.87900615 0.7644099 0.02857483 (RolloutWorker pid=20060) 0.64438045 0.08116326 0.11186912 0.8657752 0.04691187 0.04945723 (RolloutWorker pid=20060) 0.3601392 0.7127851 0.9273834 0.6988211 0.3751319 0.23599884 (RolloutWorker pid=20060) 0.00425331 0.90489435] (RolloutWorker pid=20060) [0.62482846 0.38631707 0.8953587 0.8416463 0.04531849 0.7982277 (RolloutWorker pid=20060) 0.21272177 0.17563824 0.6865026 0.9436194 0.31417152 0.5289634 (RolloutWorker pid=20060) 0.34298405 0.30028856 0.69335634 0.23132607 0.82482725 0.7607195 (RolloutWorker pid=20060) 0.13635333 0.22001445 0.67210585 0.33354798 0.19655952 0.27695823 (RolloutWorker pid=20060) 0.06571406 0.48552662 0.14429641 0.5415727 0.18240985 0.2921481 (RolloutWorker pid=20060) 0.44774318 0.02531796]] (RolloutWorker pid=20060) does not match space of view requirements obs. (RolloutWorker pid=20060) Provided tensor has shape (12, 32) and view requirement has shape shape (12, 32).Make sure dimensions match to resolve this warning. ``` As you can see agent_collector.py (https://github.com/ray-project/ray/blob/7b1736803df18e300cf7657772e5ccf372d45def/rllib/evaluation/collectors/agent_collector.py) on line 176 complains about the shape of the tensor not being the same as the shape of the view requirements, yet as the very same warning shows, it is indeed the same shape I would expect the code to not show a warning because (1) shapes are the same so there is no way I can fix this, (2) the warning is very annoying as it clutters the output with long tensor data and (3) I don't know if this can cause furhter errors that I'm not aware of due to my limited knowledge of Ray internals ### Versions / Dependencies Gymnasium 0.26.3 gymnasium-notices 0.0.1 keras 2.10.0 Keras-Preprocessing 1.1.2 nvidia-ml-py 11.495.46 ray 3.0.0.dev0 tensorboard 2.10.1 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 tensorboardX 2.5.1 tensorflow 2.10.1 tensorflow-estimator 2.10.0 tensorflow-io-gcs-filesystem 0.29.0 Python 3.10.8 ### Reproduction script ``` import numpy as np import gymnasium as gym from gymnasium import spaces from ray.tune.registry import register_env from ray.rllib.algorithms.ppo import PPOConfig class SimpleEnv(gym.Env): def __init__(self, config=None): self.action_space = spaces.Box( low = np.float32(-1), high = np.float32(1), shape = [3], dtype = np.float32 ) self.observation_space = spaces.Box( low = np.float32( 0), high = np.float32(1), shape = [12,32], dtype=np.float32 ) def reset(self, *, seed=None, options=None): return self.observation_space.sample(), {} def step(self, action ): return self.observation_space.sample(), 1, False, False, {} register_env("SimpleEnv", lambda config: SimpleEnv(config)) config = ( PPOConfig() .resources(num_gpus=1, num_cpus_per_worker=1, num_gpus_per_worker=0.1) .environment("SimpleEnv",disable_env_checking=True) .rollouts( num_rollout_workers=1, batch_mode="complete_episodes",preprocessor_pref=None,observation_filter="NoFilter",compress_observations=False) .framework(framework="tf2", eager_tracing=False) .evaluation(evaluation_num_workers=1, evaluation_interval=1) .experimental( _disable_preprocessor_api=True) ) algo = config.build() result = algo.train() ``` ### Issue Severity Medium: It is a significant difficulty but I can work around it.

Denys_Ashikhin · January 10, 2023, 1:55pm

Hey @mannyv ,

I have been, work caught up to me, and get a little burnt out on RL issues/little results (which is just a lack of practical experience on my part) but now I’m back to dip my toes into.

Looks like this is a pretty simple fix if I set the client floats to float32. Will try that later. Was too exhausted migrating my code to the new ray stuff for hours after work and missed this. I went and ~~upvoted your github post~~ (not sure how, but I left a thumbs up, let me know and I’ll do it), and I’ll reply once I get a chance to test out the same dtypes.

Thanks!

P.S.
This means that for training purposes it should be fine though?

mannyv · January 10, 2023, 2:01pm

Hi @PREJAN,

Good catch. The bug is on line 171. It is comparing an integer to a np.shape object.

np.sum(
                        (
                            tree.map_structure(
                                lambda x: np.product(getattr(x, "shape")),
                                flatten_space(vr.space),
                            )
                        )
                    ),
                )
                == np.shape(data)

The following got rid of the error for me:

np.sum(
                        (
                            tree.map_structure(
                                lambda x: np.product(getattr(x, "shape")),
                                flatten_space(vr.space),
                            )
                        )
                    ),
                )
                == np.prod(np.shape(data))

git blame CC: @arturn

PREJAN · January 10, 2023, 2:18pm

thanks, I see Arturn already assigned the issue
Using np.prod(np.shape(data)) solves it!

PREJAN · January 10, 2023, 2:20pm

it was my first github issue ever, I thought they could be upvoted ; )

but let thumbing it up be it

thanks

arturn · January 10, 2023, 2:43pm

Thanks for raising this. These warnings were an attempt of mine to catch mismatches but have done no good I believe. I tested this with a nested space and the np.prod() does not work for that case. My PR removes the warning entirely and we will probably rely on @kourosh 's spec checking decorators in the future.

mannyv · January 10, 2023, 3:01pm

Hi @arturn,

For the nested space you probably have to use tree.map_structure on both sides of the comparison.

arturn · January 10, 2023, 5:49pm

Thanks @mannyv , yeah that should have resolved it but the check is really not only not well crafted but also we have better tools and better places to do these checks now/in the near future. Thanks for providing a solution though, I very much appreciate your help!

Denys_Ashikhin · January 10, 2023, 6:23pm

So just for my own (and anyone in the future) → this check only happens once at the start, and is only a warning but won’t affect training?

And how would we do this check ourselves if it is removed in the future (in my case I’m not even using an env really since I am handling the external game communication myself and just passing actions/rewards/obs_space)?

arturn · January 10, 2023, 9:52pm

This check only happens the first time a batch passes through.
In the future, anywhere you want to check dimensions yourself, you can try doing so with what is currently under ray/rllib/models/specs at master · ray-project/ray · GitHub.
Be aware that this is under development and currently not a public API.
This is also what we will use in the future to catch mismatches.

Denys_Ashikhin · January 12, 2023, 1:23pm

@mannyv
So I did indeed fix this by casting my observation image = image.astype(“float32”)

However, I get another error that I forgot to include in my original post:

Any idea what’s happening there?

mannyv · January 12, 2023, 4:07pm

@Denys_Ashikhin,

Based on Aurturn’s comments I think you can safely ignore those. There is a bug in how they compare shapes and thry are going to take that check out in the next release.

Denys_Ashikhin · January 12, 2023, 6:05pm

Sounds good, these errors were a bit different that the one above so I wanted to check this as well.
Thanks for your help!

arturn · January 12, 2023, 10:00pm

Yeah absolutely, if we don’t crash elsewhere with less readable error, everything should be fine. Thanks again for raising this!

Topic		Replies	Views
Ray.rllib.agents.ppo missing RLlib	3	7507	March 27, 2023
[RLlib] Shape Error for custom PyTorch model RLlib	2	689	March 12, 2021
Pytorch error during evaluation RLlib	0	24	May 3, 2025
Frame Stacking W/ Policy_Server + Policy_Client RLlib	17	944	May 29, 2023
[HIGH] TypeError: policy_mapping_fn() takes 1 positional argument but 2 were given RLlib	0	209	December 3, 2023

Provided tensor has shape (240, 320, 1) and view requirement has shape shape (240, 320, 1).Make sure dimensions match to resolve this warning

Related topics