Trajectory View API Example

ahmedammar · January 7, 2023, 2:12pm

How severe does this issue affect your experience of using Ray?

None: Just asking a question out of curiosity

The example manual inference loop seems to be setting the n_* values as the same value, should these not be based on the results of previous rewards/actions/observations from previous loops?

github.com

ray-project/ray/blob/4e234b7fd741ccb1a24b7e5f0c12cc65be2826c3/rllib/examples/trajectory_view_api.py#L125-L127


      
          "prev_n_obs": np.stack([obs for _ in range(num_frames)]),
          "prev_n_actions": np.stack([0 for _ in range(num_frames)]),
          "prev_n_rewards": np.stack([1.0 for _ in range(num_frames)]),

For example should that be something like:

...
obs = env.reset()
prev_observation = collections.deque([obs] * num_frames, maxlen=num_frames)
prev_action = collections.deque([0] * num_frames, maxlen=num_frames)
prev_reward = collections.deque([1.0] * num_frames, maxlen=num_frames)
while not done:
    action, state, logits = algo.compute_single_action(
        input_dict={
            "obs": obs,
            "prev_n_obs": np.stack(prev_observation),
            "prev_n_actions": np.stack(prev_action), 
            "prev_n_rewards": np.stack(prev_reward),
        },
        full_fetch=True,
    )
    obs, reward, done, info = env.step(action)
    prev_observation.appendleft(obs)
    prev_action.appendleft(action)
    prev_reward.appendleft(reward)
    episode_reward += reward
...

ahmedammar · February 6, 2023, 10:55pm

Anyone have any insights?

mannyv · February 7, 2023, 2:08am

Hi @ahmedammar,

Your example code looks good to me except for one thing. You are adding more recent time steps on the opposite side as rllib does internally. It places earlier steps to the left of more recent steps [0,1,2,3] your use of append left is the opposite [3,2,1,0].
This will mean that when you do manual inference your prev_x inputs will be the reverse of the order used during training.

Also I think all the prev_x will be initialized by zeros.

https://docs.ray.io/en/latest/rllib/rllib-sample-collection.html#trajectory-view-api

ahmedammar · February 8, 2023, 1:11am

Thanks for the insight here, yeah I wasn’t sure about whether I should be appending to left or right, thanks for clarifying that.

But shouldn’t the example code I linked be updated (happy to send a PR) so that prev_x actually have previous values? Otherwise they’re always being sent as static values? Or am I missing something?

mannyv · February 8, 2023, 1:51am

Hi @ahmedammar,

You’re welcome.

Yes ideally it would be a working example.

The line below acknowledges that it is not complete.

Perhaps someone will generate one that is and submit a pull request.

github.com

ray-project/ray/blob/4e234b7fd741ccb1a24b7e5f0c12cc65be2826c3/rllib/examples/trajectory_view_api.py#L120


      
          env = StatelessCartPole()
          
          
# Run manual inference loop for n episodes.
          for _ in range(10):
              episode_reward = 0.0
              reward = 0.0
              action = 0
              terminated = truncated = False
              obs, info = env.reset()
              while not terminated and not truncated:
                  # Create a dummy action using the same observation n times,
                  # as well as dummy prev-n-actions and prev-n-rewards.
                  action, state, logits = algo.compute_single_action(
                      input_dict={
                          "obs": obs,
                          "prev_n_obs": np.stack([obs for _ in range(num_frames)]),
                          "prev_n_actions": np.stack([0 for _ in range(num_frames)]),
                          "prev_n_rewards": np.stack([1.0 for _ in range(num_frames)]),
                      },
                      full_fetch=True,
                  )

Topic		Replies	Views
Compute_actions for Trajectory API RLlib	11	2414	February 10, 2022
Trying to understand model and env concepts RLlib	7	495	September 29, 2021
The “trajectory_view_api” does not support the DQN algorithm, and the program will run in error RLlib	3	340	August 7, 2022
State after ray.tune with Trajectory View API RLlib	5	560	June 27, 2021
RLlib Batch Postprocessing has steps from other trajectories RLlib	5	363	April 22, 2024

Trajectory View API Example

Related topics