Ape-X not working after 1.3 update

After updating to ray version 1.3, Ape-X run is not working anymore, tested for one hour, no change, it just says pending.

Hey @faehndrichm , could you share your config here?
We are, as of 1.3, accounting also for the number of replay workers in our resource allocation logic, which we didn’t do prior to 1.3, which is basically a bug, b/c replay workers need to run in parallel of course.

Quick fix: Try running this on a larger machine (with more CPUs) or reducing the num_workers or optimizer.num_replay_buffer_shards config values.

hello!
i set workers from 8 to 7 and it worked, with 12/12 cpus, thank you! :slight_smile:
is it then 7 for workers, 1 for learner and 4 for the replay buffer shards?

i am doing research and because of the experiment design i am tied to this machine, having 12 CPU.
How would you optimally assign the ressources something like: 6 worker 3 replay buffer shards?

config was:

apex:
env: SpaceInvadersDeterministic-v4
run: APEX
checkpoint_at_end: true
checkpoint_freq: 10
stop:
timesteps_total: 10000000
config:
# Works for both torch and tf.
framework: tf
double_q: true
dueling: true
num_atoms: 51
v_min: -10.0
v_max: 10.0
noisy: false
#prioritized_replay: true
n_step: 3
lr: .0001
adam_epsilon: .00015
hiddens: [512]
buffer_size: 1000000
exploration_config:
#type: EpsilonGreedy
final_epsilon: 0.01
epsilon_timesteps: 200000
prioritized_replay_alpha: 0.5
final_prioritized_replay_beta: 1.0
prioritized_replay_beta_annealing_timesteps: 2000000
num_gpus: 1
# APEX
num_workers: 8
num_envs_per_worker: 8
rollout_fragment_length: 20
train_batch_size: 512
target_network_update_freq: 50000
timesteps_per_iteration: 100000

Hey @faehndrichm , so glad this worked. Sorry about this (soft) “break”, but the old way was just incorrect and could have lead to non-parallelization where the user would expect parallelization to happen.

You can always check an algos default_resource_request method (e.g. rllib/agents/dqn/apex.py or rllib/agents/trainer.py for the default implementation) to see how many and what types of compute it requests.

For APEX, these are the following “bundles”. Ray will try to place the resources inside one bundle on one node; you can change that behavior by changing RLlib’s placement_strategy config key to either:

    # "PACK": Packs bundles into as few nodes as possible.
    # "SPREAD": Places bundles across distinct nodes as even as possible.
    # "STRICT_PACK": Packs bundles into one node. The group is not allowed
    #   to span multiple nodes.
    # "STRICT_SPREAD": Packs bundles across distinct nodes.

APEX bundles:

1. Driver: 1CPU (num_cpus_for_driver) + n CPUs (optimizer.num_replay_buffer_shards) + m GPUs (num_gpus)
2. n workers, each with: n CPUs (num_cpus_per_worker) + m GPUs (num_gpus_per_worker)

Placing the replay buffer shards on the same node as the driver (central learner) makes sure we don’t have to transfer data all the time via the network. Also, defining these shards here as requiring their own CPU makes sure we can independently insert data from the worker into the buffer(s) w/o affecting/interrupting the learner throughput.