Hi,
I’m having trouble training a ppo model to get my agent do a very simple job. my rl problem is a single car which has to be moved to the next road in a road network until it reaches its destination, then once reached, it will be given a reward by 1.
I have defined my own custom gymnasium environment. it is working well when I tested the env.
When I use Stable Baselines3 to train a ppo model, it seems all good and well and produces a nice result where i can visualize it with tensorboard. So it confirms that my defined environment is working properly.
But, when I use rllib, ray tune.run(), it runs the simulation endlessly even if I tried many ways to force it stop after a certain num of timesteps or iteration, but no dice!
Here is the config:
def main():
ray.init(ignore_reinit_error=True)
register_env(‘car_env/car-v0’, create_env)
custom_config = {
“lr”: 0.0001, # Learning rate
“entropy_coeff”: 0.01, # Entropy coefficient
“num_steps”: 1, # Number of steps or iterations
}
log_dir = ‘E:\My files\sumo\my example\Car Control\experiments\rllib’
max_iterations = 0
stopper = MaximumIterationStopper(max_iter=max_iterations)
config = {
“env”:“car_env/car-v0”,
“framework”: “torch”,
“num_envs_per_worker”: 1,
“seed”: 123,
‘log_level’:‘ERROR’,
‘ignore_worker_failures’:False,
‘lr_schedule’:[[0,1e-1],[int(1e2), 1e-2],[int(1e3), 1e-3]],
# “evaluation_interval”:2,
# “evaluation_num_episodes”:4,
“num_gpus”:0,
“num_rollout_workers”:1,
# “num_evaluation_workers”:1,
**custom_config,
}
anlysis = tune.run(
“PPO”,
name = ‘experiment1’,
config=config,
# stop = {
# ‘training_iteration’:1,
# “episode_reward_mean”:1,
# ‘timesteps_total’:2,
# },
local_dir = log_dir+‘/net2’,
checkpoint_at_end=True,
resume=False,
stop=stopper,
)
if name == ‘main’:
main()
I’m not even sure if it has started the training, but for sure, it is running the simulation and moving the car in sumo simulator.
just ignore the warning: no connection between edges … and … because I wrote a line of code to truncate the simulation if the car is directed to a road(edge) where there is no way to its final destination. so this is part of the problem statement.