Change action space within episode


I have a custom environment that outputs some property called as “p1” in the range 100 to 10^-4.
Algo - PPO
action space - box[+0.05, -0.05, 3]
episode = 10 steps
The aim of the agent is to get a state which gives p1 as close to 10^-4 as possible.
problem - Occurance of state where (property p1 < 0.1) is very rare (0.5 %). Hence the policy learned is sub-optimal .i.e. gets state where p1 = 0.1 and not 0.0001

I want the model to take larger actions i.e box(±0.05) for the first five steps and smaller actions like box(±0.001) for the last five steps.

Is it possible to do this.

Form my opinoin:

Possible solution 1: manually shrink the action input in the step(self, action) method of your custom environment… For example,

if step > 5:
    action / <some constant>

But I don’t think it is a good idea.

Possible solution 2: change the entropy constant and the learning rate to encourage the explorations of the PPO agent.

One way of looking at your problem, is that with your action space as you defined it, the desired later actions box(±0.001) are a very tiny subset of the overall action space.

To address this, you might consider using a transform of your action space. Maybe something similar to log-modulus A log transformation of positive and negative values - The DO Loop (though you might need to change the formula a bit, say by multiplying your x by a large constant).

The right transform may make it much easier for the exploration to discover the good regions of the action space.