Below is the snapshot of my code ( I made a custom Gym Environment )
from ray.rllib.agents.ppo import PPOTrainer, DEFAULT_CONFIG
from ray.tune.logger import pretty_print
agent = PPOTrainer(config, env=“fss-v1”)
for _ in range(1):
print(“Entered _ :”,_)
result = agent.train()
Since my gym environment is custom , i would like to make a few changes in how Ray selects the actions ( currently i am guessing it is using the sample() function ) . To do so, I am not able to find the location of the train function that is connected to the Gym environment and calls the action and step function.
It would probably be easier to ask what you want to change. Rllib is a large library with lots of abstraction. Looking at the train function is not likely to be useful. RLlib has lots of configuration and callback hooks that can be used to customize most aspects of the process.
Whether the action is a stochastic or deterministic sample depends on the configuration option “explore”.
Basically, i have 144 actions ( Multi discrete 12 , 12 ) and not all of them are legal actions . I would like to early on filter out the non legal actions , so that the agent can access the legal actions and optimise the solution.
My project is on job scheduling and hence my actions are → [ Workstation , Job ] since not all workstations can work with all jobs . Hence based on qualification, the action needs to be filtered out.
I understand that Action masking implies , making this change in the Neural network side . But the agent is keep on selecting Non legal actions