I am trying to run the following command
rllib train file rllib/tuned_examples/apex_dqn/atari-apex-dqn.yaml
and get an error “Error: No available node types can fulfill resource request {‘GPU’: 1.0, ‘CPU’: 5.0}. Add suitable node types to this cluster to resolve this issue”. The script keeps running but I do not see any output on tensorboard and I periodically get “Error: No available node types can fulfill resource request {‘CPU’: 5.0, ‘GPU’: 1.0}. Add suitable node types to this cluster to resolve this issue.” so I guess it is stuck.
I tried setting ray_num_cpus=13 and ray_num_gpus=0, so that ray.init() in line 381 of train.py has these values, but I get the same behavior (it still asks for {‘GPU’: 1.0, ‘CPU’: 5.0}.)