I use an environment generated by NS3-Gym and am experiencing an issue where RLlib does not terminate after the number of steps I set it to.
For testing purposes I set rollout_fragment_length
, train_batch_size
, and horizon
all to 1 and batch_mode
to complete_episodes
(although, from my understanding, this shouldn’t make a difference if horizon
is 1).
I am now expecting the agent to execute 1 step (= 1 episode) and then terminate. When using one of the default environments (I used CartPole-v1
for testing) this seems to work fine as the whole script finishes in about 20 s. However, when I’m using my own environment, the env just gets reset after 1 step and train()
continues to run. I know that RLlib recognizes that a step has passed, since a) it executed the step (duh) and b) the environment normally would not reset after 1 step. I’m honestly at a loss as to why it just won’t stop training so any help would be greatly appreciated.
Attached below is the my training script. Since the env is bound to an external NS3 simulation, this won’t run on its own but I figured it still can’t hurt to attach it.
import ray
import ray.rllib.algorithms.dqn as dqn
from ray.tune.registry import register_env
import gym
from ns3gym import ns3env
def env_creator (env_config) -> gym.Env:
env = ns3env.Ns3Env(port=0, startSim=True)
return env
register_env("my_env", env_creator)
ray.init()
config = dqn.DEFAULT_CONFIG.copy()
config["num_gpus"]=0
config["num_workers"]=1
config["framework"]="tf2"
config["eager_tracing"]=True
config["disable_env_checking"] = True # Reset time is quite high so I disabled env checking
config["rollout_fragment_length"] = 1
config["batch_mode"] = "complete_episodes"
config["train_batch_size"] = 1
config["horizon"] = 1
algo = dqn.DQN(config=config, env="my_env")
algo.train()