I’m training an SAC agent on a custom env and I’m using the mostly default config (see config bellow) for my agent. I noticed that after 80K steps my trainer slows down drastically only producing a new episode every hour. Is this normal ? or am I screwing up a config setting ?
config['timesteps_per_iteration'] = 1000
config['learning_starts'] = 1000
config["num_gpus"] = 1
These are just output file for episode, but notice the time diff between each
Should I try setting the replay buffer cap to something smaller ?
Hi @Stale_neutrino ,
This is definitely not normal. The three config settings you posted look just fine.
You can have a look at your buffer’s estimated size with
buffer._est_size_bytes, which is also part of the buffers stats under the key “est_size_bytes”.
Another thing to look at to figure out if something is unhealthy is the ray dashboard.
Lastly: What version of ray are you using?
Hey @arturn, thanks for the feedback. I’ll keep an eye on my ray dash while training. My Ray version is 1.12.0
@arturn one more thing, I’m running Tune on my SAC agent and currently going through 10 tuning trials. Here’s the current memory usage. Does this seem normal ? Sadly I didn’t enable ray dash for this run . I’ll make new one, and once it’s done I’ll post the dash output.
Outputs from my terminal
Here’s the dash output, not sure why it’s not showing the PID of my current SAC
Hey @Stale_neutrino ,
Is your environment taking super long to initialize? Because if not, 27 minutes of running the algorithm but having only 1 CPU at work means that you have no rollout workers interacting with environments.
Have theese screenshots been taken after your training drastically slows down? If so:
Have a look at your tensorboard and look at sampling or training times. If sampling times and training times stay low, then you can me almost entirely sure it’s your buffer filling up that is the issue here.
Hey @arturn I have num workers = 0 since any time i have more than 0 I get a creation error. Regarding agent performance, here’s the sampling rate , the action processing and mean env wait time
Dumb question, how can I clear my buffer ?
For the current nightly, you can set
"capacity=<xy>" in your
Training can also simply slow down because
learning_starts timesteps/agent steps have been sampled and the training iteration functions starts to include gradient computation etc.
If you want to share more data, you can use
tensorboard dev upload --logdir <dir>