I am attempting to train with a custom (single agent, discrete action, continuous observation) environment.
My environment code is passing all of the unit tests I have written for it, yet the training process is crashing with “ValueError: The parameter logits has invalid values.”
I understand that this likely has something to do with the underlying deep learning model becoming poorly conditioned somehow, but this error itself is not informative as to how I can prevent this from happening with my environment design. For reference, I have trained on other environments without issue - so I do not think it is my installation.
Below is a template of the code I am using to train:
test_config = {
"env": mypackage.mymodule.EnvornmentClass,
"env_config": env_config,
# Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0.
"num_gpus": int(os.environ.get("RLLIB_NUM_GPUS", "0")),
"model": {
"custom_model": "my_model",
"vf_share_layers": True,
},
"lr": grid_search([1e-2, 1e-4, 1e-6]), # try different lrs
"num_workers": 1, # parallelism
"framework": "torch"
}
ray.init(local_mode=True)
trainer_config = rl_config["trainer_config"]
trainer = DQNTrainer(trainer_config) # configure the trainer
trainer.train()