How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I am training an agent for a game using PPO algorithm. Each step there are different legal actions available to the agent, so I am using action_masking. Everything works fine, however, after many iterations I get an error in _mean() function in numpy:
Runtime warning: overflow encountered in reduce
ret = umr_sum(arr, axis, dtype, out, keepdims, where=where)
My configurations are:
config = (
ppo.PPOConfig()
.environment(
MyEnv,
)
.training(
model= {
"custom_model": ActionMaskModel,
},
gamma=0.9,
lr=0.0001,
kl_coeff=0.3,
grad_clip=4
)
.framework("tf2")
.resources(
num_gpus=0
).rollouts(num_rollout_workers=1)
)
stop = {
"training_iteration": 1,
"episodes_total": 1,
"timesteps_total": 100
}
tuner = tune.Tuner(
"PPO",
param_space=config.to_dict(),
run_config=air.RunConfig(stop=stop, verbose=2, failure_config=air.FailureConfig(fail_fast=True)),
)
tuner.fit()
I suspect it is due to the gradient explosion, however, I am not completely sure. What could be the reason for the warning?