My real endgame is running a trained RL model on a Coral [from here: https://coral.ai/ ; I have the USB variant, and it’s working fine with other things ].
As far as I can tell, the first step is to convert the model to TFLite, so that it can be compiled down to something to run on the Coral [using this tool: Edge TPU Compiler | Coral ]
So, am I chasing silly ideas? Or is there a good path from a trained RLLib model, to TFLite? Or is there some completely-other approach I’d be better off using?
Cheers!
Gary
I’d imagine yes. What format is your exported RLlib model in right now?
Currently I’m just using the command line tool “rllib”:
#!/bin/sh
ALG=PPO
EXPERIMENT=${ALG}
WORKDIR=rtamray_results
rllib train --run ${ALG} --env RTAM-v0 \
--checkpoint-freq 5 --checkpoint-at-end --keep-checkpoints-num 3 \
--experiment-name ${EXPERIMENT} --local-dir ${WORKDIR} \
--config '{"num_workers": 4}'
Which outputs opaque “checkpoint” files:
rtamray_results/PPO$ ls -l PPO_RTAM-v0_fcf10_00000_0_2021-09-24_14-25-49/checkpoint_000015/
total 2820
-rw-rw-r-- 1 chunky chunky 2879800 Sep 24 14:27 checkpoint-15
-rw-rw-r-- 1 chunky chunky 181 Sep 24 14:27 checkpoint-15.tune_metadata
chunky@gbriggs-desktop:~/src/rtam_openai/agents/ray/rtamray_results/PPO$
I’m also a little unsure how much extra “magic” there is going on; such that, even if I were able to get a .tflite file out, what additional steps it would take to execute that model and get real actions output [and if there’s any other weird layers that would need setting up, between the observation and the first layer of the tflite model?]
Thanks,
Gary
This is the entire class I currently am using for the forward pass:
class RayDream(DreamInterface):
def __init__(self, dreamenv, checkpoint, navmode):
rtam_config = with_common_config({
"env": "RTAM-v0",
"num_workers": 0
})
self.navmode = navmode
self.dreamenv = dreamenv
self.agent = ESTrainer(rtam_config, env="RTAM-v0")
self.agent.restore(checkpoint)
def get_single_action(self, obs):
action_step = self.agent.compute_action(obs)
return action_step
def get_one_path(self, loc, heading_deg, deadline_t=None, moved_players={}, tgts=None):
obs = self.dreamenv.reset(loc, heading_deg, moved_players, tgts)
curr_t = time.time()
path = [self.dreamenv.loc]
ep_complete = False
while not ep_complete and (deadline_t is None or curr_t < deadline_t):
action_step = self.agent.compute_action(obs)
obs, reward, ep_complete, debug = self.dreamenv.step(action_step)
path.append(self.dreamenv.loc)
curr_t = time.time()
return path
Cheers!
Gary
I added a comment to a github issue that aligns with this question:
opened 01:21AM - 25 May 20 UTC
enhancement
P2
I have a PPO policy based model that I train with RLLib using the Ray Tune API o… n some standard gym environments (with no fancy preprocessing). I have model checkpoints saved which I can load from and restore for further training.
I want to export my model for production onto a system that should ideally have no dependencies on Ray or RLLib. Is there a simple way to do this?
I know that there is an interface `export_model` in the `rllib.policy.tf_policy` class, but it doesn't seem particularly easy to use. For instance, after calling `export_model('savedir')` in my training script, and in another context loading via `model = tf.saved_model.load('savedir')`, the resulting `model` object is troublesome (something like `model.signatures['serving_default'](gym_observation)` doesn't work) to feed the correct inputs into for evaluation. I'm ideally looking for a method that would allow for easy out of the box model loading and evaluation on observation objects
Any guidance would be greatly appreciated
Gary
gentle bump
I feel a little out of my depth with this; closing the loop on how to go from a trained RLlib checkpoint to a functioning model without the ray/rllib infrastrcture is still a fairly opaque process to me.
Thanks,
Gary
I already added a comment to above-mentioned Github PR, but for anyone reading this thread, here is the link to a tool that transforms RLlib checkpoint to an ONNX model and can run inference using ONNX runtime: GitHub - airboxlab/rllib-fast-serve: Tools and examples to export policies trained with Ray RLlib for lightweight and fast inference
Thanks for that @antoine-galataud !