RLlib's PolicyServer and external simulator as client

Hello Ray community,

I use RLlib in combination with a custom external simulator. For this purpose, I use a PolicyServer on RLlib’s side and a client on external simulator’s side (HTTP server/client).
Now, my problem is that I cannot further speed up the simulation (i.e. faster call an env step and get an action) since communication between client and server currently takes about 100-300ms on average.
Time horizon in the env is several hours (or infinite) and in each step simulated time is incremented by 1s. Thus, episodes may still take a (too) long time.

Any recommendations on this dilemma?

Can you run several simulators (clients) connecting to the same server?
Or several simulators using the same client (vectorized)?

Not as yet, I can run only one instance of the simulator on a computer. But later the idea would be to run an instance of the simulator on several computers :see_no_evil:

Correct me if I’m wrong, but several instances of the simulator don’t solve this “client/server communication bottleneck”. Is there any chance to increase the throughput (e.g. use a sockets or TCP/IP communication instead of HTTP?)???

Yeah, the 100-300ms seems like a lot :confused:

Did you try using inference_mode=local to compute the action on the client side? That way, we only have to send data to the server for training, never for single action computations.

1 Like
    client = PolicyClient(
        "http://" + args.server,
        inference_mode="local",
        update_interval=[every how many sec to update the client's weights from the server?])

Sorry, I forgot to mention that the client side (external simulator) runs outside Python. Otherwise, I would using PolicyClient class and the setting inference_mode='local' for sure, but simulator/client runs in C# :see_no_evil:

1 Like

I’ve found out that a call of my custom NN model in forward_rnn seems to take more than 50% of the time for a complete client-server round trip. From your experience, would you say that this a normal proportion? Also, it seems that this amout of time doesn’t really scale with the number of params in NN model :thinking:

Ah yes, sorry, I remember your use-case (no python on the client side :stuck_out_tongue_winking_eye:)
Hmm, wouldn’t say this is not normal for a model to take up a long time to do a computation. How many parameters does your model have? Also torch or tf?

1 Like

And yes, if you are not doing any batching for action computations (like parallelizing your env/simulator), this does seem like a quite inefficient setup. You are spending lots of time on a) sending a single observation through the wire (+slowness of http) and b) doing a forward pass (on a possibly large model) on a batch of size 1.

Would it be difficult to write an inference-only C# policy so you could do local inference with it and from time to time update its weights that are coming from the server?
I think this would speed up everything considerably.
Sure, we could also provide a faster client/server protocol for RLlib, maybe using msgpack w/ tcp. I’ll add this to our list of improvements (not sure whether this would make it for Q2, though).

1 Like

Yes, that’s me :sweat_smile:
TF model which is mostly shared between two policies atm. There are two transportation agents and thus at most two NN model calls per step. My custom model has four “entrance branches” with dense layers, pooling layer, concatenation, dense layer, LSTM and dense layer. A small test config has about 295k params, the “original” config has about 2.5m params (see screenshots).

@klausk55 have you looked into python bindings for your C# sim?

@rusu24edward Do you mean something like Python.NET or IronPython?
If so, do you have experiences with such “bindings”? I don’t have.
Additionally, I have concerns about compatibility with RLlib (i.e. can I still use RLlib?).

I don’t have experience with C# bindings, but I’ve experimented with C++ bindings for a simulation with RLlib, and I was able to train.

I accidentally moved this conversation to a personal message between klausk and me. Here’s some important follow ups for anyone reading this convo:

me: Here’s a small demo of what I did before: GitHub - rusu24edward/pybind11-demo: Demonstrates how to call a C++ class from Python using pybind11. . Basically, I recreated the simple corridor example in C++ and used pybind11 to create python bindings. I then created a driver script that trained a policy using RLlib. It was all very seamless once I got the connection between C++ and Python.

klausk: I’ve experimented with Python.NET which allows embedding Python in c# and it works really nice yet! :grinning:
So far, I initialize and call a PolicyClient from c# using inference_mode=“local” which saves me from doing a server-client roundtrip each step. This is much faster to generate rollouts!
Probably, it also should go without making use of PolicyServer/Client and directly utilize ExternalEnv resp. BaseEnv class, but at this stage of prototyping I’m really happy with the current workaround :raised_hands:
Thanks a lot for your tip, maybe it’s a milestone in further progress!

1 Like

@sven1977, do you think it would be beneficial to include this guidance in the rllib docs for external environments? We can use and format the demo I linked to for the C++/Python bindings.