ERROR: "Data columns must be same length" when attempting to create tf serving model

I’m getting the following error when attempting to export a tf serving model using the checkpoint file. For context, I’m using action masking on a Tuple Action space and I wasn’t getting this error prior to using action masking so perhaps that is where the issue lies. The error message lists out a number of arrays but which ones have the shape mismatch?

Traceback (most recent call last):
  File "train-ray.py", line 420, in <module>
    MyLauncher().train_main()
  File "/opt/ml/code/sagemaker_rl/ray_launcher.py", line 350, in train_main
    launcher.launch()
  File "train-ray.py", line 389, in launch
    use_pytorch=use_pytorch)
  File "/opt/ml/code/sagemaker_rl/ray_launcher.py", line 251, in save_checkpoint_and_serving_model
    self.create_tf_serving_model(algorithm, env_string)
  File "/opt/ml/code/sagemaker_rl/ray_launcher.py", line 240, in create_tf_serving_model
    agent = cls(env=env_string, config=config)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/agents/trainer_template.py", line 90, in __init__
    Trainer.__init__(self, config, env, logger_creator)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/agents/trainer.py", line 448, in __init__
    super().__init__(config, logger_creator)
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/trainable.py", line 174, in __init__
    self._setup(copy.deepcopy(self.config))
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/agents/trainer.py", line 591, in _setup
    self._init(self.config, self.env_creator)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/agents/trainer_template.py", line 117, in _init
    self.config["num_workers"])
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/agents/trainer.py", line 662, in _make_workers
    logdir=self.logdir)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/worker_set.py", line 61, in __init__
    RolloutWorker, env_creator, policy, 0, self._local_config)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/worker_set.py", line 279, in _make_worker
    extra_python_environs=extra_python_environs)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/rollout_worker.py", line 391, in __init__
    policy_dict, policy_config)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/evaluation/rollout_worker.py", line 859, in _build_policy_map
    policy_map[name] = cls(obs_space, act_space, merged_conf)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/policy/eager_tf_policy.py", line 258, in __init__
    self._initialize_loss_with_dummy_batch()
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/policy/eager_tf_policy.py", line 632, in _initialize_loss_with_dummy_batch
    SampleBatch(dummy_batch))
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/policy/sample_batch.py", line 62, in __init__
    self.data, lengths)
AssertionError: ('data columns must be same length', {'obs': array([[-0.16753997, -0.8131003 , -0.3058182 ,  0.2200041 , -0.8453412 ,
        -0.2022055 ,  0.85689116,  0.24618384, -0.7730741 , -0.11723867,
         0.24153124,  0.06563056, -0.9881637 ,  0.3914931 , -0.42937464,
        -0.61651164,  0.16560812,  0.43424338, -0.48306796, -0.16379589,
         0.6629045 , -0.83640736,  0.55711555, -0.99590707,  0.4565502 ,
         0.5004547 ,  0.09825385, -0.6208752 ,  0.30217493, -0.95540255,
        -0.9390568 , -0.15556492,  0.9635776 , -0.7789579 , -0.08081707,
        -0.5235052 ,  0.36796525, -0.2596724 ,  0.2986152 , -0.9335729 ,
        -0.8205236 , -0.07025883, -0.17298482,  0.59653604,  0.78505236,
        -0.37407067,  0.3451196 ,  0.8478205 ,  0.6491549 , -0.3552222 ,
        -0.9410455 , -0.19240941,  0.2433002 ,  0.92845184,  0.13893473,
        -0.46645886,  0.77974063,  0.9887864 , -0.5882755 , -0.16081837,
         0.18821715, -0.48406398, -0.18187997, -0.26430282, -0.09947821,
        -0.6070124 , -0.00425298,  0.60601455,  0.478472  , -0.73200834,
        -0.9221117 , -0.57326376,  0.8292637 , -0.27201414, -0.49593267]],
      dtype=float32), 'new_obs': array([[ 0.8084266 ,  0.60376465,  0.78310597,  0.45286018, -0.5389207 ,
        -0.26351827, -0.19457214, -0.45364273, -0.7939361 , -0.6026908 ,
        -0.366684  , -0.26129752,  0.45846698,  0.9235534 , -0.6174818 ,
        -0.27158985, -0.3849102 , -0.42492867, -0.01951972, -0.9497818 ,
         0.6311238 ,  0.3333413 , -0.8332515 ,  0.5431858 ,  0.65956146,
         0.02491495, -0.12982146,  0.2892342 , -0.06229191,  0.21960628,
         0.03798034, -0.8105211 ,  0.45083812,  0.5011283 ,  0.9295937 ,
        -0.347093  , -0.2709405 , -0.40550342,  0.33635706,  0.6673113 ,
        -0.81663233, -0.7930882 , -0.09971744, -0.982902  ,  0.4535259 ,
         0.67291045, -0.623508  ,  0.7011699 , -0.65178597, -0.36389652,
         0.48030367,  0.8339387 , -0.29725808, -0.48181954,  0.9213085 ,
         0.332354  , -0.75827074,  0.78023595, -0.2888754 , -0.3171853 ,
         0.10102597,  0.48887405,  0.7183128 ,  0.9172779 , -0.13899948,
         0.06689885,  0.05988894,  0.9012435 , -0.5165003 ,  0.7978972 ,
         0.65684813, -0.3593439 ,  0.2804876 , -0.96951115, -0.4018025 ]],
      dtype=float32), 'dones': array([False]), 'actions': array([[4],
       [5],
       [0],
       [0]]), 'rewards': array([0.], dtype=float32), 'prev_actions': array([[4],
       [5],
       [0],
       [0]]), 'prev_rewards': array([0.], dtype=float32), 'action_prob': array([1.], dtype=float32), 'action_logp': array([0.], dtype=float32), 'action_dist_inputs': array([[-1.8790666e-03, -1.2569092e-03,  8.0820790e-04,  1.5254925e-03,
         3.0071309e-03, -6.0335770e-03,  1.9848817e-03, -5.2638206e-05,
        -1.4693772e-03, -1.6271004e-03, -6.9682496e-03, -2.8726761e-03,
         5.1925275e-03,  2.4756842e-04, -7.4656908e-03,  2.0904108e-03,
         8.5534025e-03, -6.3283751e-03, -5.6225746e-03,  2.0334723e-03,
         8.9268445e-04,  2.4471004e-03,  3.7310375e-03, -1.2613479e-03,
        -1.7964891e-03, -1.6982645e-03, -3.7542414e-03, -2.3088434e-03,
        -4.4651306e-03,  3.2427113e-03, -6.2494841e-03,  1.1091821e-03]],
      dtype=float32), 'vf_preds': array([-0.00676239], dtype=float32)}, [1, 1, 1, 4, 1, 4, 1, 1, 1, 1, 1])

I get this error after running the following:

if ray.__version__ >= "0.6.5":
    from ray.rllib.agents.registry import get_agent_class
else:
    from ray.rllib.agents.agent import get_agent_class
cls = get_agent_class(algorithm)
with open(os.path.join(MODEL_OUTPUT_DIR, "params.json")) as config_json:
    config = json.load(config_json)
print("Loaded config for TensorFlow serving.")
config["monitor"] = False
config["num_workers"] = 1
config["num_gpus"] = 0
agent = cls(env=env_string, config=config)
checkpoint = os.path.join(MODEL_OUTPUT_DIR, "checkpoint")
agent.restore(checkpoint)
export_tf_serving(agent, MODEL_OUTPUT_DIR)

Thanks for posting this question, @pgigioli !
It’s your actions and prev_actions that are causing this error, since they are Tuple actions (4 sub components), but encoded as np.arrays, such that it looks like their batch size is different (4 vs 1 of all other columns).
Your actual batch size is 1.
Your action array is: np.array([[4],[5],[0],[0]]), which has shape (4, 1). It should have shape (1, 4, 1).

This could be a bug in RLlib somewhere. Would you be able to create a self-sufficient reproduction script that shows this error and file a github issue, then ping me back here? Thanks!

Hi @sven1977, I’ve updated my custom model class to match the class in the action masking example but now I’m getting a different error (might be related to the same issue):

2021-01-25 17:42:46,690#011ERROR trial_runner.py:519 -- Trial PPO_TORIEnv_00000: Error processing event.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/trial_runner.py", line 467, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/ray_trial_executor.py", line 431, in fetch_result
    result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
  File "/usr/local/lib/python3.6/dist-packages/ray/worker.py", line 1515, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(InvalidArgumentError): #033[36mray::PPO.train()#033[39m (pid=420, ip=10.169.129.5)
  File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 417, in ray._raylet.execute_task.function_executor
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/agents/trainer.py", line 495, in train
    raise e
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/agents/trainer.py", line 484, in train
    result = Trainable.train(self)
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/trainable.py", line 261, in train
    result = self._train()
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/agents/trainer_template.py", line 151, in _train
    fetches = self.optimizer.step()
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/optimizers/multi_gpu_optimizer.py", line 148, in step
    self.train_batch_size)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/optimizers/rollout.py", line 25, in collect_samples
    next_sample = ray_get_and_free(fut_sample)
  File "/usr/local/lib/python3.6/dist-packages/ray/rllib/utils/memory.py", line 32, in ray_get_and_free
    return ray.get(object_ids)
ray.exceptions.RayTaskError(InvalidArgumentError): #033[36mray::RolloutWorker.sample()#033[39m (pid=468, ip=10.169.129.5)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1352, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1445, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of 4 which is outside the valid range of [0, 4).  Label values: 4
#011 [[{{node default_policy/SparseSoftmaxCrossEntropyWithLogits_2/SparseSoftmaxCrossEntropyWithLogits}}]]

The model seems to work during the sample collection phase but fails during the optimization phase. My custom model class is:

class ActionMaskingModel(TFModelV2):
    def __init__(self, obs_space, action_space, num_outputs, model_config, name, true_obs_shape=(11,), action_embed_size=32, **kw):
        super(ActionMaskingModel, self).__init__(
            obs_space, action_space, num_outputs, model_config, name, **kw)

        self.action_embed_model = FullyConnectedNetwork(spaces.Box(-1, 1, shape=true_obs_shape), action_space,
                                                        action_embed_size, model_config, name)
        self.register_variables(self.action_embed_model.variables())
        
    def forward(self, input_dict, state, seq_lens):
        # Extract the available actions tensor from the observation.
        avail_actions = tf.cast(tf.concat(input_dict["obs"]["avail_actions"], axis=1), tf.float32)
        action_mask = tf.cast(tf.concat(input_dict["obs"]["action_mask"], axis=1), tf.float32)
        
        # Compute the predicted action embedding
        action_embedding, _ = self.action_embed_model({"obs": input_dict["obs"]["state"]})
        
        # Expand the model output to [BATCH, 1, EMBED_SIZE]. Note that the
        # avail actions tensor is of shape [BATCH, MAX_ACTIONS, EMBED_SIZE].
        intent_vector = tf.expand_dims(action_embedding, 1)
 
        # Batch dot product => shape of logits is [BATCH, MAX_ACTIONS].
        action_logits = tf.reduce_sum(avail_actions * intent_vector, axis=1)
 
        # Mask out invalid actions (use tf.float32.min for stability)
        inf_mask = tf.maximum(tf.math.log(action_mask), tf.float32.min)
        
        return action_logits + inf_mask, state
 
    def value_function(self):
        return self.action_embed_model.value_function()

This class is almost identical to the class in the action masking example except I am concatting the action masks of each sub action space into a single action mask so i can mask the Tuple actions all at once. Is this the right approach? Also, am I correcting in setting action_embed_size = sum(Tuple action space sizes)? The output of forward() gives me a tensor with shape [BATCH_SIZE, sum(tuple action spaces sizes)].

Thanks!

Would you be able to reproduce this using our example script and then make the necessary small changes to get the error? Like using a Tuple action space, etc…

The reason I’m asking is I need something concrete to debug this on. Then I would be able to help you very quickly, I think.

Hi @sven1977, the issue was resolved after downgrading from tf 2.1.0 to 1.15.4. Thanks for your help!