Python code with large dependencies

I have a Python code that depends on other packages. If i execute my code in Ray, then all dependencies are serialized and transferred to Ray. Is there a way to avoid serialization of packages that my code depends on? May be i can install them on the machines where Ray cluster is running and then my code will be able to use them? Is there other approach?

Hi Gil, generally dependencies are not serialized and each node should have the dependencies installed.

Only your distributed function is serialized - even if you use Ray libraries you’re expected to have the same dependencies/same versions installed on all nodes.

I guess if it small dependencies, they are serialized…this is what i observed…may be also depends what are the dependencies of the code

I think that easiest way to avoid problems with dependencies (including ray version and python version) is to use docker images.

I was sceptic for it but in practise in my opinion is the easiest way to set the same versions of all software in all nodes.
I built own docker image for this purpose and it works fine, however its very important to set docker run options properly ( directory paths, network access, port maping, cpu, gpu etc.).

Now I try how to configure gpu access in docker for Windows 10.

Peter

so you suggest to build a Docker image with dependencies and run Ray cluster on those images?

@Gil_Vernik , I build docker image with ray[all] gym[atari], gym[box2d], modin , sk-learn argparse for tensorflow 2,

docker pull peterpirogtf/ray_tf2:latest

If you want I can write options for docker run which works for me.