Observation Space Issue with weird (?,val) shape

Hi,
I was recently working on a research project for my graduate degree and I was using RLLIB with a custom environment. I trained different models on my custom environment and worked out fine (PPO, DQN, ARS, and custom model). However, my issues starts to appear when I want to evaluate.

One example is using the PPO training from the documentation with default configurations not much change, the training phase goes well, the reward increases and all is okay. When I come to the evaluation part, I keep facing this error:

ValueError: Cannot feed value of shape (61,) for Tensor default_policy/obs:0, which has shape (?, 61)

Here’s how I do my evaluation:

and this is my Observation space defined in the environment:

        self.observation_space_dict = Dict({
            'action_mask': Box(0, 1, shape=(self.buffer_length,),dtype=np.float32),
            'avail_actions': Box(-np.inf, np.inf, shape=(self.buffer_length,),dtype=np.float32),
            'Online_Buffer': Box(low=-2, high=np.inf ,shape=(self.buffer_length,),dtype=np.float32),
            'C_jobs': MultiBinary(self.buffer_length),
            'RemLaxity_jobs': Box(low=-np.inf, high=np.inf, shape=(self.buffer_length, 2),dtype=np.float32),
            'ProcessorSpeed':Box(low=np.array([0.]), high=np.array([np.inf]),dtype=np.float32),
        })

        self.observation_space= flatten_space(self.observation_space_dict)

and this is how I update it:

        obs_dict = dict({
            'action_mask': self.action_mask,
            'avail_actions': self.action_assignments,
            'Online_Buffer': np.array(self.online_buffer),
            'C_jobs': np.array(self.workbuffer[:, 3]).flatten(),            ## Criticality Column
            'RemLaxity_jobs': np.array(self.workbuffer[:, 4:6]),            ## Remaining time and Adjusted Priority
            'ProcessorSpeed': np.array([self.speed]).flatten()
        })
        obs_out = flatten(self.observation_space_dict,obs_dict)

I have been debugging this issue for hours and I can’t seem to get to the core of the problem. It keeps reading my Obs space in a weird way and I need to evaluate my models to continue with my research. Would be grateful to any help or nudge to the right direction.

I also tried to use Tuple instead of Dict below:

        self.observation_space_dict = Tuple(
            (
            Box(0, 1, shape=(self.buffer_length,)),
            Box(-np.inf, np.inf, shape=(self.buffer_length,)),
            Box(low=-2, high=np.inf ,shape=(self.buffer_length,)),
            Box(low=0,high=1,shape=(self.buffer_length,)),
            Box(low=-np.inf, high=np.inf, shape=(self.buffer_length, 2)),
            Box(low=np.array([0.]), high=np.array([np.inf]))
            )
        )

but I still get the exact same error, I really can’t seem to find where is the first structure that gives shape (?,61) or how to adjust it or reshape it.

Hi @omarelseadawy,

The? dimension is the batch dimension. I think it should have worked without it but as a quick check what happens if you try

action=policy. compute_actions(np. expand_dims(state, 0))

Hi @mannyv,

Thank you for your reply.
I was slightly confused between compute_action (which was deprecated to compute_single_action) and compute_actions and their functionalities. I readjusted my evaluation framework to make use of compute_single_action instead, and it worked out fine.
Sorry I saw your solution late, and I’m unsure if it will solve the issue. Yet thank you for your help :slight_smile: