Crash during training: paramter logits has invalid values

I am attempting to train with a custom (single agent, discrete action, continuous observation) environment.

My environment code is passing all of the unit tests I have written for it, yet the training process is crashing with “ValueError: The parameter logits has invalid values.”

I understand that this likely has something to do with the underlying deep learning model becoming poorly conditioned somehow, but this error itself is not informative as to how I can prevent this from happening with my environment design. For reference, I have trained on other environments without issue - so I do not think it is my installation.

Below is a template of the code I am using to train:

  test_config = {
        "env": mypackage.mymodule.EnvornmentClass,
        "env_config": env_config,
        # Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0.
        "num_gpus": int(os.environ.get("RLLIB_NUM_GPUS", "0")),
        "model": {
            "custom_model": "my_model",
            "vf_share_layers": True,
        "lr": grid_search([1e-2, 1e-4, 1e-6]),  # try different lrs
        "num_workers": 1,  # parallelism
        "framework": "torch"

    trainer_config = rl_config["trainer_config"]

    trainer = DQNTrainer(trainer_config)  # configure the trainer

I assume you model starts to predict inf/nan values which cause the error. Now there are many ways to prevent this from happening. I propose to add a l2 regularizer for you model weights.

Right now, I am using the model defined in the custom environment example from the Ray/RL Lib documentation.

Looking for documentation on how to edit this model to incorporate your proposed revisions - do you have a pointer to a page on this?

You could use something like this for your custom model:

I hope that helps.