Hi,

I am working on a custom policy that uses the `state_batches`

in `compute_actions`

to keep track of an internal policy state, which gets updated at each timestep of the environment (think of an updated expectation value of observations). I use the *Trajectory View API* with the following settings in my policy’s `__init__()`

:

```
self.view_requirements['state_in_0'] = \
ViewRequirement('state_out_0',
shift=-1,
used_for_training=False,
used_for_compute_actions=True)
```

The initial state is defined as:

```
# Initial state in custome policy:
def get_initial_state(self):
return [np.zeros(8,dtype=np.float64)]
```

This initial state btw gets already shaped to the following when arriving in `compute_actions`

():

```
# What happened here?
[array([[0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)]
```

Furthermore, I return as second object in the `compute_actions()`

function of my policy a list of `BATCH_SIZE`

numpy arrays of shape `(STATE_SIZE,)`

.

When I analyze the `SampleBatch`

of an episode after training I can see that the `state_out_0`

is not identical to the `state_in_0`

of the next timestep (why? normalization?):

```
print(batch.__getitem__('state_in_0')[:2])
print(batch.__getitem__('state_out_0')[:2])
[[ 0. 0. 0. 0. 0. 0.
0. 0. ]
[-0.05122958 1.07819295 1.02973902 1.037444 0.89701152 -0.04827012
-0.0305887 -0.01290726] # <- this should equal ]
[[0.02288876 0.58843416 0.31873515 0.56406111 0.29287788 0.
0. 0. ] # <- this
[0.0270972 0.74560356 0.45348084 0.71006745 0.40950587 0.0121695
0.00722509 0.00228068]]
```

I took a look at the definition of `compute_actions()`

which returns the new `state_batches`

in shape `[STATE_SIZE, BATCH_SIZE]`

and type `List[TensorType]`

. So I thought I have to change the output shape and did so:

```
# [STATE_SIZE, BATCH_SIZE] = [8, 1]
[array([0.02288876]), array([0.58843414]), array([0.31873516]), array([0.56406113]), array([0.29287789]), array([0.]), array([0.]), array([0.])]
```

However, in the next timestep the `state_in_0`

variable has the shape:

```
[array([0.02288876])]
```

which gives necessarily an error. I am confused. **Can anyone tell me, how to correctly define the initial state and return the state_batches?** (maybe give a hint where in the source code to find the processing of the

`state_batches`

)Thanks for your help

Simon