Hi,
I was recently working on a research project for my graduate degree and I was using RLLIB with a custom environment. I trained different models on my custom environment and worked out fine (PPO, DQN, ARS, and custom model). However, my issues starts to appear when I want to evaluate.
One example is using the PPO training from the documentation with default configurations not much change, the training phase goes well, the reward increases and all is okay. When I come to the evaluation part, I keep facing this error:
ValueError: Cannot feed value of shape (61,) for Tensor default_policy/obs:0, which has shape (?, 61)
Here’s how I do my evaluation:
and this is my Observation space defined in the environment:
self.observation_space_dict = Dict({
'action_mask': Box(0, 1, shape=(self.buffer_length,),dtype=np.float32),
'avail_actions': Box(-np.inf, np.inf, shape=(self.buffer_length,),dtype=np.float32),
'Online_Buffer': Box(low=-2, high=np.inf ,shape=(self.buffer_length,),dtype=np.float32),
'C_jobs': MultiBinary(self.buffer_length),
'RemLaxity_jobs': Box(low=-np.inf, high=np.inf, shape=(self.buffer_length, 2),dtype=np.float32),
'ProcessorSpeed':Box(low=np.array([0.]), high=np.array([np.inf]),dtype=np.float32),
})
self.observation_space= flatten_space(self.observation_space_dict)
and this is how I update it:
obs_dict = dict({
'action_mask': self.action_mask,
'avail_actions': self.action_assignments,
'Online_Buffer': np.array(self.online_buffer),
'C_jobs': np.array(self.workbuffer[:, 3]).flatten(), ## Criticality Column
'RemLaxity_jobs': np.array(self.workbuffer[:, 4:6]), ## Remaining time and Adjusted Priority
'ProcessorSpeed': np.array([self.speed]).flatten()
})
obs_out = flatten(self.observation_space_dict,obs_dict)
I have been debugging this issue for hours and I can’t seem to get to the core of the problem. It keeps reading my Obs space in a weird way and I need to evaluate my models to continue with my research. Would be grateful to any help or nudge to the right direction.