How severe does this issue affect your experience of using Ray?
Medium: It contributes to significant difficulty to complete my task, but I can work around it.
I am working on a reinforcement learning project and would like to use Ray RLLib to help scale our training process. Due to the nature of the simulator I’m working with, I need to create a Policy Server and Policy Client. One issue I’m having is that I need to access the policy client inside the simulator. Long story short, the challenge I’m facing is that the simulator is running its own Python variable space, and so I need to figure out how to give the simulator access to the Policy Client object. One solution I came up with was to instantiate a Policy Client object, pickle it, and then unpickle it during an episode rollout. I tested this idea this morning and I’m getting an error when I try to pickle a Policy Client object. The exact error is:
AttributeError: Can’t pickle local object ‘_auto_wrap_external..wrapped_creator.._ExternalEnvWrapper’
My first question is, is the Policy Client object even pickle-able? If not, what might be some other ways I can try to access the Policy Client from within a python script with its own variable space? Can I set up the Policy Client as a Flask App and then access it that way or something? Definitely open to any and all suggestions. Thanks!
Is there a reason you cannot create a PolicyClient object in the process that the simulator is running in?
Alternatively, if you are comfortable using remote inference, the PolicyServer is basically just a rest server so you could just send your data to the appropriate URIs using http requests. Here is the HTTP handler used by PolicyServer
Hi @mannyv, thanks for the reply. There are a couple reasons why I think I wouldn’t want to make a PolicyClient in the process that’s running the simulator, but maybe they’re not valid concerns. So let’s poke at them some.
First, the simulator is based in C++ with a thin Python API. Each simulator “episode” starts and stops its own python process which I’m relying for importing various functions to interact with the PolicyClient (get_action, log_returns, etc). The python process exits at the end of the episode and so all the variables (including the PolicyClient object) are removed. This is what made me wonder if I could just instantiate and pickle a PolicyClient before the episode kicks off and un-pickle it inside the sim when the episode starts up. My concern with this is the step for the Ray module import takes a good bit of time, and I want to save that overheard if possible since I’ll be doing thousands of episodes, but maybe there’s a workaround for that that I’m just not aware of?
The second reason I’m considering not making a PolicyClient each episode largely stems from my ignorance about Ray, but involves keeping track of episode ID. If I make a fresh, new PolicyClient object, each episode, what happens with the episode ID to ensure they are unique across many parallel rollouts? Is this even a real concern and are there some easy workarounds to this?
Regarding your remote inference suggestion, are you saying I could cut out using a PolicyClient altogether and just connect directly to the PolicyServer? Thanks for sharing the code snippet. Do you have any other examples of how I might use this Handler class object to interact with the server? I’m somewhat familiar with REST, but by no means operational with it.
My current solution is to start up a PolicyClient in as a subprocess and then make a socket connection to it inside my sim episode and then interact with the PolicyClient via some command/args interface for passing data back and forth. This feels clunky, but its where my head is at right now.
@rusu24edward once I started coding up my solution, this is also what I ultimately settled on as well – calling and starting my sim via a subprocess and passing data via socket connection to the process running the policy client. Thanks for your input. Glad to hear I wasn’t barking up the wrong tree or overcomplicating my project.