How severe does this issue affect your experience of using Ray?
- None: Just asking a question out of curiosity
- Low: It annoys or frustrates me for a moment.
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
- High: It blocks me to complete my task.
High
Hi all,
I am training a PPO using my custom environment. The code can run, but the process stops in 3 or 4 steps in one episode. (there are time steps in my environment). I would appreciate it a lot if someone could help me with the code. Following is my config
code.
def train_ppo():
config = (
PPOConfig()
.training(
train_batch_size_per_learner=64,
mini_batch_size_per_learner=64,
lambda_=0.95,
kl_coeff=0.5,
clip_param=0.1,
vf_clip_param=10.0,
entropy_coeff=0.01,
num_sgd_iter=10,
lr=0.00015,
grad_clip=100.0,
grad_clip_by="global_norm",
)
.environment("my_env", env_config=env_config)
.rollouts(
num_rollout_workers=1,
rollout_fragment_length=50,
)
.rl_module(
model_config_dict={
"fcnet_hiddens": [512, 512],
"fcnet_activation": "tanh"
}
)
.resources(
num_gpus=0,
)
.callbacks(CustomCallbacks)
.debugging(log_level="DEBUG")
)
algo = config.build()
result = algo.train()
train_ppo()
Another problem is I don’t know how to output the reward I got. There is no reward
in result
. And I can confirm that I have return reward
in my step
function in the environment.
Thank you!!
BR