Possible to Pickle PolicyClient object?

Ryan · August 25, 2022, 4:53pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hello all,
I am working on a reinforcement learning project and would like to use Ray RLLib to help scale our training process. Due to the nature of the simulator I’m working with, I need to create a Policy Server and Policy Client. One issue I’m having is that I need to access the policy client inside the simulator. Long story short, the challenge I’m facing is that the simulator is running its own Python variable space, and so I need to figure out how to give the simulator access to the Policy Client object. One solution I came up with was to instantiate a Policy Client object, pickle it, and then unpickle it during an episode rollout. I tested this idea this morning and I’m getting an error when I try to pickle a Policy Client object. The exact error is:

AttributeError: Can’t pickle local object ‘_auto_wrap_external..wrapped_creator.._ExternalEnvWrapper’

My first question is, is the Policy Client object even pickle-able? If not, what might be some other ways I can try to access the Policy Client from within a python script with its own variable space? Can I set up the Policy Client as a Flask App and then access it that way or something? Definitely open to any and all suggestions. Thanks!

mannyv · August 26, 2022, 1:31pm

Hi Ryan,

Welcome to the forums.

Is there a reason you cannot create a PolicyClient object in the process that the simulator is running in?

Alternatively, if you are comfortable using remote inference, the PolicyServer is basically just a rest server so you could just send your data to the appropriate URIs using http requests. Here is the HTTP handler used by PolicyServer

github.com

ray-project/ray/blob/master/rllib/env/policy_server_input.py#L226


      
              batch.decompress_if_needed()
              samples_queue.put(batch)
              for rollout_metric in data["metrics"]:
                  metrics_queue.put(rollout_metric)
          
          
    if child_rollout_worker is not None:
                  child_rollout_worker.set_weights(
                      rollout_worker.get_weights(), rollout_worker.get_global_vars()
                  )
          
          
class Handler(SimpleHTTPRequestHandler):
              def __init__(self, *a, **kw):
                  super().__init__(*a, **kw)
          
          
    def do_POST(self):
                  content_len = int(self.headers.get("Content-Length"), 0)
                  raw_body = self.rfile.read(content_len)
                  parsed_input = pickle.loads(raw_body)
                  try:
                      response = self.execute_command(parsed_input)
                      self.send_response(200)

Ryan · August 26, 2022, 3:19pm

Hi @mannyv, thanks for the reply. There are a couple reasons why I think I wouldn’t want to make a PolicyClient in the process that’s running the simulator, but maybe they’re not valid concerns. So let’s poke at them some.

First, the simulator is based in C++ with a thin Python API. Each simulator “episode” starts and stops its own python process which I’m relying for importing various functions to interact with the PolicyClient (get_action, log_returns, etc). The python process exits at the end of the episode and so all the variables (including the PolicyClient object) are removed. This is what made me wonder if I could just instantiate and pickle a PolicyClient before the episode kicks off and un-pickle it inside the sim when the episode starts up. My concern with this is the step for the Ray module import takes a good bit of time, and I want to save that overheard if possible since I’ll be doing thousands of episodes, but maybe there’s a workaround for that that I’m just not aware of?

The second reason I’m considering not making a PolicyClient each episode largely stems from my ignorance about Ray, but involves keeping track of episode ID. If I make a fresh, new PolicyClient object, each episode, what happens with the episode ID to ensure they are unique across many parallel rollouts? Is this even a real concern and are there some easy workarounds to this?

Regarding your remote inference suggestion, are you saying I could cut out using a PolicyClient altogether and just connect directly to the PolicyServer? Thanks for sharing the code snippet. Do you have any other examples of how I might use this Handler class object to interact with the server? I’m somewhat familiar with REST, but by no means operational with it.

My current solution is to start up a PolicyClient in as a subprocess and then make a socket connection to it inside my sim episode and then interact with the PolicyClient via some command/args interface for passing data back and forth. This feels clunky, but its where my head is at right now.

rusu24edward · September 6, 2022, 6:22pm

This is what I do, but the other way around: I start up the sim via a subprocess.

Ryan · September 6, 2022, 7:03pm

@rusu24edward once I started coding up my solution, this is also what I ultimately settled on as well – calling and starting my sim via a subprocess and passing data via socket connection to the process running the policy client. Thanks for your input. Glad to hear I wasn’t barking up the wrong tree or overcomplicating my project.

Topic		Replies	Views
Running PolicyServer code asynchronously in background RLlib	3	424	September 28, 2021
RLlib's PolicyServer and external simulator as client RLlib	15	1734	April 12, 2021
Trainers with policy client/server(s) on a single machine locks at .train(), is multi-thread explicitly needed? RLlib	3	271	July 26, 2022
ValueError: Policies using the new Connector API do not support ExternalEnv RLlib	7	847	August 17, 2023
Frame Stacking W/ Policy_Server + Policy_Client RLlib	17	944	May 29, 2023

Possible to Pickle PolicyClient object?

Related topics