Utilization of resources by RLlib

I am working on a project for algorithmic trading and Black-Litterman Portfolio optimization with reinforcement learning. Here I am using RLlib for the PPO algorithm and hyperparameter optimization using Ray tune.
Link to project: GitHub - Athe-kunal/Black-Litterman-Portfolio-Optimization-using-RL

From my university cluster, I have one v100 GPU and 2 Core Xeon CPU. Here is my configuration parameters

num_workers = 1
num_samples = 20
num_gpus = 1
num_cpus = 2
training_iterations = 200
checkpoint_freq = 1
num_envs_per_worker = 100
worker_cpu = 0.5
worker_gpu = 0.5
log_level="DEBUG"

It is a small financial environment with only 206 time steps and to run the codes, you can do

python main.py --if_confidence true --model mlp

Issue that I am facing:
The ray trials are not able to utilize the hardware properly. I have only 2 core CPUs (but they are Xeon CPUs which can potentially have more workers). I am logging all my results to Weights and Biases here: Weights & Biases

In the sample_perf tab, you can see the resource utilization, where I can see a flat line. How can I ensure that I am using the hardware effectively? It is a server environment, hence I am unable to access Ray dashboard, so this weights and biases report is helpful. But as I am learning ray and rllib, can someone help me to debug and understand how can I use my resources effectively?

@sven1977 Can you please take up the issue and suggest to me how can I improve my performance? Currently, only one trial takes 40-50 seconds. However as there are 500 training iterations followed by hyperparameter optimization, it will take a lot of time.

Hi @sven1977. Please do have a look at it