[RLlib] Impossible actions

This “available actions thing” seems to be helpful when you have to handle huge action spaces and/or a varying number of available actions during steps.
A small example how I interpret it:
all_actions = {0, 1, 2, 3, 4, 5}, n=6 total number of actions, action_embedding_size=2
=> action embedding matrix E is 6x2
m=3 < n available action at a specific timestep, e.g. avail_actions=(0, 2, 4)
=> action embedding for avail_actions is a matrix E* composed of 1st, 3rd and 5th row of E
Finally, you calculate the dot product of the intent vector from the NN (with action_embedding_size) and E*

Code example showing the embeddings:

import numpy
import tensorflow as tf
model = tf.keras.Sequential()
embed = tf.keras.layers.Embedding(6, 2)
model.add(embed)
model.summary()
print(embed.get_weights())
input = numpy.asarray_chkfinite([0, 2, 4])
model.compile("rmsprop", "mse")
out = model.predict(input)
for i in range(out.shape[0]):
    print("{}     {}".format(input[i], out[i]))

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (None, None, 2)           12
=================================================================
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________
[array([[-0.04093417, -0.02362244],
       [-0.01528452, -0.02044444],
       [ 0.04733466,  0.01246139],
       [ 0.01975517,  0.02948004],
       [ 0.03812562,  0.0137356 ],
       [ 0.04121368, -0.01421856]], dtype=float32)]
2021-04-20 15:25:32.082275: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
0     [[-0.04093417 -0.02362244]]
2     [[0.04733466 0.01246139]]
4     [[0.03812562 0.0137356 ]]