Hi .
I’m currently trying to finetune my policy using pretrained policy and actor, and since they have different architecture, i’m using the following to config the algorithm.
# Set up your SAC agent configuration
config=SACConfig()
# ROLLOUT PARAMS
config = config.rollouts(num_rollout_workers=0, num_envs_per_worker=8)
# num_rollout_workers – Number of rollout worker actors to create for parallel sampling. Setting this to 0 will force rollouts to be done in the local worker (a.k.a algorithms's actor)
# num_envs_per_worker – Number of environments to evaluate vector-wise per worker. This enables model inference batching, which can improve performance for inference bottlenecked workloads.
# Evaluation will be run in the algorithm process (local worker) if not specified explicitly in .evaluation()
# RESOURCE PARAMS
config = config.resources(num_cpus_per_worker=8, num_gpus=1.0)
# If you specify num_gpus
config = config.framework('torch')
config = config.environment(env="voxel-v0", normalize_actions=False, clip_actions=True, disable_env_checking=True)
config = config.training(
# model={
# "custom_model": "voxelgym_model",
# "custom_model_config": {},
# },
gamma = 0,
initial_alpha = 0.5,
train_batch_size = 8,
policy_model_config={"custom_model": "voxelgym_policy_model",
"custom_model_config": {'pretrained_actor_path': actor_checkpoint_file_path}},
q_model_config={"custom_model": "voxelgym_q_model",
"custom_model_config": {'pretrained_critic_path': critic_checkpoint_file_path}},
replay_buffer_config={'type': 'MultiAgentReplayBuffer', 'capacity': 1600},
num_steps_sampled_before_learning_starts=16,
optimization_config={"actor_learning_rate": 6e-5,
"critic_learning_rate": 1.5e-5,
"entropy_learning_rate": 3e-4}
# _deterministic_loss=True)
)
config = config.callbacks(MemoryTrackingCallbacks)
config = config.reporting(min_sample_timesteps_per_iteration=8, min_train_timesteps_per_iteration=8) # does this affect training? https://docs.ray.io/en/latest/rllib/rllib-training.html#specifying-training-options
algo = config.build()
print(config.to_dict())
However when i learn few steps of training with this, the sampling process is super slow. I assume that “mean_raw_obs_processing_ms: 255653.7306457758” might be the cause, but I’m still struggling to find where to look into. The actor and critic are pretty large compared to the default ones, but looking at the inference time, this doesn’t seem like an issue.
Another thing is, as far as I know, i should see some cpu and gpu usage after running the trainer but I don’t see any cpu and gpu usage when i check with “ray status”. Could you have a look at below, and what i’m doing wrong? Actually, I still haven’t fully understood how the resource and rollout has to be configured, the initial configuration might be the cause.
Any kind of input is appreciated! Thank you in advance!!
======== Autoscaler status: 2023-06-29 14:24:41.151989 ========
Node status
---------------------------------------------------------------
Healthy:
1 node_3fdeaf8d0b6ec409680e874ebc415d952180dbf3eeedf25bc40e8418
Pending:
(no pending nodes)
Recent failures:
(no failures)
Resources
---------------------------------------------------------------
Usage:
0.0/20.0 CPU
0.0/1.0 GPU
0B/11.17GiB memory
0B/5.58GiB object_store_memory
Demands:
(no resource demands)
(ivienv) C:\Users\kimtw\IVI\voxelgym-mimic-astar>
date: 2023-06-29_14-14-15
done: false
episode_len_mean: 1.0
episode_media: {}
episode_reward_max: 0.35000000000000003
episode_reward_mean: 0.28828125
episode_reward_min: 0.2
episodes_this_iter: 8
episodes_total: 32
hostname: Paranoia
info:
last_target_update_ts: 32
learner:
default_policy:
custom_metrics: {}
diff_num_grad_updates_vs_sampler_policy: 0.625
learner_stats:
actor_loss: 957.0989379882812
alpha_loss: 103.95216369628906
alpha_value: 0.5003001689910889
critic_loss: 0.0004399276222102344
log_alpha_value: -0.692547082901001
max_q: 0.3004091680049896
mean_q: 0.25926417112350464
min_q: 0.2091337889432907
policy_t: 0.012088620103895664
target_entropy: -1764.0
mean_td_error: 0.02397516369819641
model: {}
num_grad_updates_lifetime: 2.0
td_error: [0.018754109740257263, 0.048659585416316986, 0.012076392769813538,
0.004243031144142151, 0.034527674317359924, 0.023390352725982666, 0.04329864680767059,
0.006851509213447571]
num_agent_steps_sampled: 32
num_agent_steps_trained: 16
num_env_steps_sampled: 32
num_env_steps_trained: 16
num_target_updates: 2
iterations_since_restore: 2
node_ip: 127.0.0.1
num_agent_steps_sampled: 32
num_agent_steps_trained: 16
num_env_steps_sampled: 32
num_env_steps_sampled_this_iter: 8
num_env_steps_trained: 16
num_env_steps_trained_this_iter: 8
num_faulty_episodes: 0
num_healthy_workers: 0
num_in_flight_async_reqs: 0
num_remote_worker_restarts: 0
num_steps_trained_this_iter: 8
perf:
cpu_util_percent: 1.9754545454545456
gpu_util_percent0: 0.19686363636363638
ram_util_percent: 70.44000000000001
vram_util_percent0: 0.37611150568181817
pid: 15504
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 2.971327304840088
mean_env_render_ms: 0.0
mean_env_wait_ms: 30.17140030860901
mean_inference_ms: 103.63717079162598
mean_raw_obs_processing_ms: 255653.7306457758
sampler_results:
connector_metrics:
ObsPreprocessorConnector_ms: 0.0
StateBufferConnector_ms: 0.6659079757001665
ViewRequirementAgentConnector_ms: 0.45820590522554183
custom_metrics:
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:3263/count_max: 1000001
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:3263/count_mean: 1000001.0
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:3263/count_min: 1000001
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:3263/size_max: 46875.5234375
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:3263/size_mean: 46875.5234375
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:3263/size_min: 46875.5234375
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:514/count_max: 2000000
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:514/count_mean: 2000000.0
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:514/count_min: 2000000
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:514/size_max: 390625.0
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:514/size_mean: 390625.0
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:514/size_min: 390625.0
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:277/count_max: 1000000
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:277/count_mean: 1000000.0
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:277/count_min: 1000000
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:277/size_max: 54687.5
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:277/size_mean: 54687.5
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:277/size_min: 54687.5
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:295/count_max: 1000000
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:295/count_mean: 1000000.0
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:295/count_min: 1000000
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:295/size_max: 31250.0
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:295/size_mean: 31250.0
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:295/size_min: 31250.0
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:362/count_max: 1000000
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:362/count_mean: 1000000.0
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:362/count_min: 1000000
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:362/size_max: 62500.0
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:362/size_mean: 62500.0
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:362/size_min: 62500.0
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:363/count_max: 1000000
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:363/count_mean: 1000000.0
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:363/count_min: 1000000
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:363/size_max: 62500.0
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:363/size_mean: 62500.0
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:363/size_min: 62500.0
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:431/count_max: 1000000
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:431/count_mean: 1000000.0
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:431/count_min: 1000000
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:431/size_max: 54687.5
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:431/size_mean: 54687.5
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:431/size_min: 54687.5
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:724/count_max: 1000000
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:724/count_mean: 1000000.0
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:724/count_min: 1000000
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:724/size_max: 54687.5
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:724/size_mean: 54687.5
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:724/size_min: 54687.5
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\features\image.py:177/count_max: 1000001
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\features\image.py:177/count_mean: 1000001.0
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\features\image.py:177/count_min: 1000001
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\features\image.py:177/size_max: 78125.5390625
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\features\image.py:177/size_mean: 78125.5390625
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\features\image.py:177/size_min: 78125.5390625
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\formatting\formatting.py:147/count_max: 1000464
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\formatting\formatting.py:147/count_mean: 1000448.21875
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\formatting\formatting.py:147/count_min: 1000442
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\formatting\formatting.py:147/size_max: 208623.8515625
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\formatting\formatting.py:147/size_mean: 208621.26245117188
tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\formatting\formatting.py:147/size_min: 208620.2421875
tracemalloc/worker/rss_max: 14482673664
tracemalloc/worker/rss_mean: 14237046016.0
tracemalloc/worker/rss_min: 13700857856
tracemalloc/worker/vms_max: 20168679424
tracemalloc/worker/vms_mean: 19596363392.0
tracemalloc/worker/vms_min: 18766987264
episode_len_mean: 1.0
episode_media: {}
episode_reward_max: 0.35000000000000003
episode_reward_mean: 0.28828125
episode_reward_min: 0.2
episodes_this_iter: 8
hist_stats:
episode_lengths: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
episode_reward: [0.35000000000000003, 0.225, 0.275, 0.30000000000000004, 0.30000000000000004,
0.30000000000000004, 0.225, 0.325, 0.25, 0.35000000000000003, 0.35000000000000003,
0.25, 0.2, 0.275, 0.30000000000000004, 0.25, 0.35000000000000003, 0.275, 0.275,
0.275, 0.325, 0.25, 0.25, 0.325, 0.275, 0.225, 0.325, 0.30000000000000004, 0.325,
0.30000000000000004, 0.325, 0.30000000000000004]
num_faulty_episodes: 0
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
mean_action_processing_ms: 2.971327304840088
mean_env_render_ms: 0.0
mean_env_wait_ms: 30.17140030860901
mean_inference_ms: 103.63717079162598
mean_raw_obs_processing_ms: 255653.7306457758
time_since_restore: 1290.4770221710205
time_this_iter_s: 269.1462426185608
time_total_s: 1290.4770221710205
timers:
learn_throughput: 15.576
learn_time_ms: 513.6
load_throughput: 4569.581
load_time_ms: 1.751
sample_time_ms: 322334.412
synch_weights_time_ms: 0.0
training_iteration_time_ms: 322609.99
timestamp: 1688040855
timesteps_total: 32
training_iteration: 2
trial_id: default