Magnitude slower performance on GCP Cluster vs. macbook pro

Hi all, I am trying to train my deep RL algorithm implemented with rllib. It runs fine on my macbook pro (13 inch, 2018, 4 cores/8 threads) but is, as expected, pretty slow. In order to speed it up a lot, I have been using a ray cluster on GCP.

However, this cluster is an order of magnitude slower than my macbook! The cluster is set to have ~18 cores; for reference, the core speeds listed by GCP are roughly the same as my macbook (or better). Despite this, getting to the first postprocessing phase of the training takes about 5 minutes using all 18 cores, full load. On my macbook, getting to this phase using only 1 worker takes about 1 minute - all training parameters are the same on the two use cases. Using GPU does not help.

From the ray dashboard, the bottleneck is happening under RolloutWorker.par_iter_next(), which is same bottleneck on my macbook. However, this should be much faster on the cloud cluster. Any advice?

So this is a really frustrating error, but it seems a large part of this is because the default pip installation of numpy does not include BLAS/LAPACK, so it performs much slower as the images provided for GCP do not include them either. The conda install of numpy does include them, so it works faster. See here for some more discussion.

In summary, this isn’t really a ray/rllib error, but for future it is important to note that rllib (at least) is heavily bottlenecked by numpy if numpy is not running efficiently as expected.

1 Like

@sven1977 It’ll be nice to explicitly mention this in the doc if you didn’t yet!

Thanks for sharing this important finding. We are aware that some bottlenecks in RLlib are caused by “out-of-graph” (i.e. mostly numpy) code. The fact that it’s 5 times slower is scary, though. I didn’t know that. I’m reposting the solution for Linux here:

  1. Use np.show_config() to find out whether “blas” is installed.
  2. If not, fix this via:
sudo pip3 uninstall numpy 
sudo apt-get install build-essential python-dev python-setuptools libboost-python-dev libboost-thread-dev -y
sudo pip3 install numpy