Below is the snapshot of my code ( I made a custom Gym Environment )
from ray.rllib.agents.ppo import PPOTrainer, DEFAULT_CONFIG
from ray.tune.logger import pretty_print
agent = PPOTrainer(config, env=“fss-v1”)
for _ in range(1):
print(“Entered _ :”,_)
result = agent.train()
print(pretty_print(result))
ray.shutdown()
Since my gym environment is custom , i would like to make a few changes in how Ray selects the actions ( currently i am guessing it is using the sample() function ) . To do so, I am not able to find the location of the train function that is connected to the Gym environment and calls the action and step function.
Hi @Archana_R,
It would probably be easier to ask what you want to change. Rllib is a large library with lots of abstraction. Looking at the train function is not likely to be useful. RLlib has lots of configuration and callback hooks that can be used to customize most aspects of the process.
Whether the action is a stochastic or deterministic sample depends on the configuration option “explore”.
Basically, i have 144 actions ( Multi discrete 12 , 12 ) and not all of them are legal actions . I would like to early on filter out the non legal actions , so that the agent can access the legal actions and optimise the solution.
My project is on job scheduling and hence my actions are → [ Workstation , Job ] since not all workstations can work with all jobs . Hence based on qualification, the action needs to be filtered out.
I understand that Action masking implies , making this change in the Neural network side . But the agent is keep on selecting Non legal actions
What kind of algorithm are you using. If you are using DQN then action masking is not straightforward. If you are using a PG algorithm like A2C or PPO then you want to do action masking like this:
As for the action selection that will be a combination of an exploration algorithm which are found here:
Thank you for this. But in the files you have shared, these are empty functions. Can you please help me locate where does the action sample gets picked from the environment ?
What do you mean by they are empty functions? Those are the functions that are called to convert the logits returned by your model into actions that are passed into your environment.
In particular, you probably want to look at the sample or deterministic_sample methods in the Categorical action_distribution.