Adds action masking as requested in https://github.com/openai/gym/issues/2823 to… allow spaces to mask certain actions. These masks are the positive case where `1` means that it is possible for the action to be taken and `0` for the action to not be possible. For all of the gym environments, this PR adds a parameter in `sample(mask=...)` with the particular type required being dependent on the space. `Box` is a special case where we don't implement masking due to the neural network not being able to provide values for continuous distributions, however, if a good reason is found, this could be added.
To the gym Taxi environment, we add a new info key "action_mask" which is the recommended method for using the masking for custom environments.
### Example masks
```python
>>> import numpy as np
>>> from gym import spaces
# Box space doesn't have masks
>>> space = spaces.Discrete(4)
>>> space.sample(mask=np.array([0, 1, 1, 1], dtype=np.int8))
2
>>> space.sample(mask=np.array([0, 0, 0, 0], dtype=np.int8))
0
>>> space = spaces.MultiDiscrete([4, 2])
>>> space.sample(mask=(np.array([0, 1, 0, 1], dtype=np.int8), np.array([0, 0], dtype=np.int8)))
[1 0]
>>> space = spaces.MultiDiscrete(np.array([[4, 2], [3, 4]]))
>>> space.sample(mask=((np.array([1, 1, 1, 1], dtype=np.int8), np.array([0, 1], dtype=np.int8)), (np.array([0, 0, 0], dtype=np.int8), np.array([1, 1, 0, 0], dtype=np.int8))))
[[2 1]
[0 1]]
>>> space = spaces.MultiBinary([2, 3])
>>> space.sample(mask=np.array([[0, 0, 1], [1, 1, 0]], dtype=np.int8))
[[0 0 0]
[1 1 0]]
# Composite spaces (Dict, Tuple and Graph)
>>> space = spaces.Dict(a=spaces.Discrete(3), b=spaces.Box(0, 1, (1,)))
>>> space.sample(mask={"a": np.array([0, 1, 1], dtype=np.int8), "b": None}))
OrderedDict([('a', 1), ('b', array([0.6812336], dtype=float32))])
>>> space = spaces.Tuple((spaces.Box(0, 1, (1,)), spaces.Discrete(3)))
>>> space.sample(mask=(None, np.array([0, 0, 0], dtype=np.int8)))
(array([0.74909943], dtype=float32), 0)
>>> space = spaces.Graph(node_space=spaces.Box(0, 1, (1,)), edge_space=spaces.Discrete(3))
>>> space.sample(mask=(None, np.array([0, 1, 1], dtype=np.int8)), num_nodes=4))
GraphInstance(nodes=array([[0.5791068 ], [0.43347424], [0.6848027 ], [0.23124644]], dtype=float32), edges=array([2, 2]), edge_links=array([[1, 0], [2, 1]]))
>>> space.sample(mask=(None, (np.array([1, 1, 1], dtype=np.int8), np.array([1, 0, 0], dtype=np.int8), np.array([0, 1, 0], dtype=np.int8), np.array([0, 1, 1], dtype=np.int8))), num_nodes=4, num_edges=4))
GraphInstance(nodes=array([[0.21213211], [0.4872798 ], [0.69442934], [0.92085034]], dtype=float32), edges=array([2, 0, 1, 2]), edge_links=array([[0, 2], [0, 0], [3, 2], [2, 0]]))
```
- [x] Add tests for sample mask
- [x] Add tests for sample with discrete distributions (Discrete, MultiDiscrete, MultiBinary)
- [ ] Add test for Box sample
- [x] Add docstrings
- [x] Add taxi docstrings
- [x] Fix taxi action mask