ValueError: Expected parameter logits (...) to satisfy the constraint IndependentConstraint(Real(), 1)

See also @mannyv 's answer in another topic. You also need to set sgd_minibatch_size" > "max_seq_len". As I cannot see your code, its hard to make remote guesses.

I would - no matter if custom or default - debug a whole trainer/sampler iteration to see what happens after evaluation/training with the logits. You say you use a custom RNN? Either it outputs already the NaNs or it happens somewhere after in the Trainer/Sampler iteration. Somewhere these values must occur. When running on Kubernetes you might be able to execute the code in a single container and use "local_mode"=True. You could also use a local Minikube and see if you can replicate the error there