Training with pre-trained actor and critic using SAC is too slow

Hi :slight_smile:.
I’m currently trying to finetune my policy using pretrained policy and actor, and since they have different architecture, i’m using the following to config the algorithm.

# Set up your SAC agent configuration
config=SACConfig()

# ROLLOUT PARAMS
config = config.rollouts(num_rollout_workers=0, num_envs_per_worker=8)
# num_rollout_workers – Number of rollout worker actors to create for parallel sampling. Setting this to 0 will force rollouts to be done in the local worker (a.k.a algorithms's actor)
# num_envs_per_worker – Number of environments to evaluate vector-wise per worker. This enables model inference batching, which can improve performance for inference bottlenecked workloads.
# Evaluation will be run in the algorithm process (local worker) if not specified explicitly in .evaluation()

# RESOURCE PARAMS
config = config.resources(num_cpus_per_worker=8, num_gpus=1.0)
# If you specify num_gpus
    
config = config.framework('torch')
config = config.environment(env="voxel-v0", normalize_actions=False, clip_actions=True, disable_env_checking=True)
config = config.training(
        # model={
        #     "custom_model": "voxelgym_model",
        #     "custom_model_config": {},
        # },
        gamma = 0,
        initial_alpha = 0.5,
        train_batch_size = 8,
        policy_model_config={"custom_model": "voxelgym_policy_model",
                             "custom_model_config": {'pretrained_actor_path': actor_checkpoint_file_path}},
        q_model_config={"custom_model": "voxelgym_q_model",
                        "custom_model_config": {'pretrained_critic_path': critic_checkpoint_file_path}},
        replay_buffer_config={'type': 'MultiAgentReplayBuffer', 'capacity': 1600},
        num_steps_sampled_before_learning_starts=16,
        optimization_config={"actor_learning_rate": 6e-5,
                             "critic_learning_rate": 1.5e-5,
                             "entropy_learning_rate": 3e-4}
        # _deterministic_loss=True)
    )
config = config.callbacks(MemoryTrackingCallbacks)
config = config.reporting(min_sample_timesteps_per_iteration=8, min_train_timesteps_per_iteration=8) # does this affect training? https://docs.ray.io/en/latest/rllib/rllib-training.html#specifying-training-options

algo = config.build()
print(config.to_dict())

However when i learn few steps of training with this, the sampling process is super slow. I assume that “mean_raw_obs_processing_ms: 255653.7306457758” might be the cause, but I’m still struggling to find where to look into. The actor and critic are pretty large compared to the default ones, but looking at the inference time, this doesn’t seem like an issue.

Another thing is, as far as I know, i should see some cpu and gpu usage after running the trainer but I don’t see any cpu and gpu usage when i check with “ray status”. Could you have a look at below, and what i’m doing wrong? Actually, I still haven’t fully understood how the resource and rollout has to be configured, the initial configuration might be the cause. :frowning:

Any kind of input is appreciated! Thank you in advance!!

======== Autoscaler status: 2023-06-29 14:24:41.151989 ========
Node status
---------------------------------------------------------------
Healthy:
 1 node_3fdeaf8d0b6ec409680e874ebc415d952180dbf3eeedf25bc40e8418
Pending:
 (no pending nodes)
Recent failures:
 (no failures)

Resources
---------------------------------------------------------------
Usage:
 0.0/20.0 CPU
 0.0/1.0 GPU
 0B/11.17GiB memory
 0B/5.58GiB object_store_memory

Demands:
 (no resource demands)

(ivienv) C:\Users\kimtw\IVI\voxelgym-mimic-astar>
date: 2023-06-29_14-14-15
done: false
episode_len_mean: 1.0
episode_media: {}
episode_reward_max: 0.35000000000000003
episode_reward_mean: 0.28828125
episode_reward_min: 0.2
episodes_this_iter: 8
episodes_total: 32
hostname: Paranoia
info:
  last_target_update_ts: 32
  learner:
    default_policy:
      custom_metrics: {}
      diff_num_grad_updates_vs_sampler_policy: 0.625
      learner_stats:
        actor_loss: 957.0989379882812
        alpha_loss: 103.95216369628906
        alpha_value: 0.5003001689910889
        critic_loss: 0.0004399276222102344
        log_alpha_value: -0.692547082901001
        max_q: 0.3004091680049896
        mean_q: 0.25926417112350464
        min_q: 0.2091337889432907
        policy_t: 0.012088620103895664
        target_entropy: -1764.0
      mean_td_error: 0.02397516369819641
      model: {}
      num_grad_updates_lifetime: 2.0
      td_error: [0.018754109740257263, 0.048659585416316986, 0.012076392769813538,
        0.004243031144142151, 0.034527674317359924, 0.023390352725982666, 0.04329864680767059,
        0.006851509213447571]
  num_agent_steps_sampled: 32
  num_agent_steps_trained: 16
  num_env_steps_sampled: 32
  num_env_steps_trained: 16
  num_target_updates: 2
iterations_since_restore: 2
node_ip: 127.0.0.1
num_agent_steps_sampled: 32
num_agent_steps_trained: 16
num_env_steps_sampled: 32
num_env_steps_sampled_this_iter: 8
num_env_steps_trained: 16
num_env_steps_trained_this_iter: 8
num_faulty_episodes: 0
num_healthy_workers: 0
num_in_flight_async_reqs: 0
num_remote_worker_restarts: 0
num_steps_trained_this_iter: 8
perf:
  cpu_util_percent: 1.9754545454545456
  gpu_util_percent0: 0.19686363636363638
  ram_util_percent: 70.44000000000001
  vram_util_percent0: 0.37611150568181817
pid: 15504
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf:
  mean_action_processing_ms: 2.971327304840088
  mean_env_render_ms: 0.0
  mean_env_wait_ms: 30.17140030860901
  mean_inference_ms: 103.63717079162598
  mean_raw_obs_processing_ms: 255653.7306457758
sampler_results:
  connector_metrics:
    ObsPreprocessorConnector_ms: 0.0
    StateBufferConnector_ms: 0.6659079757001665
    ViewRequirementAgentConnector_ms: 0.45820590522554183
  custom_metrics:
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:3263/count_max: 1000001
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:3263/count_mean: 1000001.0
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:3263/count_min: 1000001
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:3263/size_max: 46875.5234375
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:3263/size_mean: 46875.5234375
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:3263/size_min: 46875.5234375
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:514/count_max: 2000000
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:514/count_mean: 2000000.0
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:514/count_min: 2000000
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:514/size_max: 390625.0
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:514/size_mean: 390625.0
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\Image.py:514/size_min: 390625.0
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:277/count_max: 1000000
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:277/count_mean: 1000000.0
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:277/count_min: 1000000
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:277/size_max: 54687.5
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:277/size_mean: 54687.5
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:277/size_min: 54687.5
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:295/count_max: 1000000
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:295/count_mean: 1000000.0
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:295/count_min: 1000000
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:295/size_max: 31250.0
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:295/size_mean: 31250.0
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\ImageFile.py:295/size_min: 31250.0
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:362/count_max: 1000000
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:362/count_mean: 1000000.0
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:362/count_min: 1000000
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:362/size_max: 62500.0
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:362/size_mean: 62500.0
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:362/size_min: 62500.0
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:363/count_max: 1000000
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:363/count_mean: 1000000.0
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:363/count_min: 1000000
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:363/size_max: 62500.0
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:363/size_mean: 62500.0
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:363/size_min: 62500.0
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:431/count_max: 1000000
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:431/count_mean: 1000000.0
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:431/count_min: 1000000
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:431/size_max: 54687.5
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:431/size_mean: 54687.5
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:431/size_min: 54687.5
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:724/count_max: 1000000
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:724/count_mean: 1000000.0
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:724/count_min: 1000000
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:724/size_max: 54687.5
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:724/size_mean: 54687.5
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\PIL\PngImagePlugin.py:724/size_min: 54687.5
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\features\image.py:177/count_max: 1000001
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\features\image.py:177/count_mean: 1000001.0
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\features\image.py:177/count_min: 1000001
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\features\image.py:177/size_max: 78125.5390625
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\features\image.py:177/size_mean: 78125.5390625
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\features\image.py:177/size_min: 78125.5390625
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\formatting\formatting.py:147/count_max: 1000464
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\formatting\formatting.py:147/count_mean: 1000448.21875
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\formatting\formatting.py:147/count_min: 1000442
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\formatting\formatting.py:147/size_max: 208623.8515625
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\formatting\formatting.py:147/size_mean: 208621.26245117188
    tracemalloc/c:\Users\kimtw\anaconda3\envs\ivienv\lib\site-packages\datasets\formatting\formatting.py:147/size_min: 208620.2421875
    tracemalloc/worker/rss_max: 14482673664
    tracemalloc/worker/rss_mean: 14237046016.0
    tracemalloc/worker/rss_min: 13700857856
    tracemalloc/worker/vms_max: 20168679424
    tracemalloc/worker/vms_mean: 19596363392.0
    tracemalloc/worker/vms_min: 18766987264
  episode_len_mean: 1.0
  episode_media: {}
  episode_reward_max: 0.35000000000000003
  episode_reward_mean: 0.28828125
  episode_reward_min: 0.2
  episodes_this_iter: 8
  hist_stats:
    episode_lengths: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
      1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
    episode_reward: [0.35000000000000003, 0.225, 0.275, 0.30000000000000004, 0.30000000000000004,
      0.30000000000000004, 0.225, 0.325, 0.25, 0.35000000000000003, 0.35000000000000003,
      0.25, 0.2, 0.275, 0.30000000000000004, 0.25, 0.35000000000000003, 0.275, 0.275,
      0.275, 0.325, 0.25, 0.25, 0.325, 0.275, 0.225, 0.325, 0.30000000000000004, 0.325,
      0.30000000000000004, 0.325, 0.30000000000000004]
  num_faulty_episodes: 0
  policy_reward_max: {}
  policy_reward_mean: {}
  policy_reward_min: {}
  sampler_perf:
    mean_action_processing_ms: 2.971327304840088
    mean_env_render_ms: 0.0
    mean_env_wait_ms: 30.17140030860901
    mean_inference_ms: 103.63717079162598
    mean_raw_obs_processing_ms: 255653.7306457758
time_since_restore: 1290.4770221710205
time_this_iter_s: 269.1462426185608
time_total_s: 1290.4770221710205
timers:
  learn_throughput: 15.576
  learn_time_ms: 513.6
  load_throughput: 4569.581
  load_time_ms: 1.751
  sample_time_ms: 322334.412
  synch_weights_time_ms: 0.0
  training_iteration_time_ms: 322609.99
timestamp: 1688040855
timesteps_total: 32
training_iteration: 2
trial_id: default