Skipping some actions

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I have an environment where I want only some of its state-actions get learned. By that I mean that for example if I have a sequence of state-actions like (S1, A1), (S2, A2), (S3, A3), (S4, A4),(S5, A5) and I want e.g. only the action (S1,A1) and (S4, A4) get leanrned and skip other states. And noo action or state get saved on the and the other state of the environment and the actions of those state comes from some internal logic of the environment. E.g. consider an schedueler simulator that wants to scheduler some stuff on nodes every 5 seconds but the scheduler simulator has states for every 1 second but those satas either doesn’t have and action of the action is coming from some internal logic not the scheduler agent.

Hey @saeid93 , great question. You could use a custom callback and override the postprocess_trajectory method therein. See here on how to easily do this:


    def on_postprocess_trajectory(
        worker: RolloutWorker,
        episode: Episode,
        agent_id: str,
        policy_id: str,
        policies: Dict[str, Policy],
        postprocessed_batch: SampleBatch,  # <- this is the batch, you would like to filter. Make sure this happens in-place (see example code below)
        original_batches: Dict[str, Tuple[Policy, SampleBatch]],
        print("postprocessed {} steps".format(postprocessed_batch.count))
        indices = np.array([0, 3, 7])  # e.g. those indices you would like to keep
        postprocessed_batch["obs"] = np.take(postprocessed_batch["obs"], indices, axis=0)
        postprocessed_batch["actions"] = ...
        # Do this with every key in postprocessed_batch to make sure the batch dimension remains the same across all columns (actions, states, rewards, dones, etc..)

1 Like

Hi @sven1977 , That’s exactly what I need! I was looking into a hack with changing my environment but that’s a much cleaner solution. many thanks!