Using a trained RL model with TFLite?

My real endgame is running a trained RL model on a Coral [from here: ; I have the USB variant, and it’s working fine with other things ].

As far as I can tell, the first step is to convert the model to TFLite, so that it can be compiled down to something to run on the Coral [using this tool: Edge TPU Compiler | Coral ]

So, am I chasing silly ideas? Or is there a good path from a trained RLLib model, to TFLite? Or is there some completely-other approach I’d be better off using?


I’d imagine yes. What format is your exported RLlib model in right now?

Currently I’m just using the command line tool “rllib”:


rllib train --run ${ALG} --env RTAM-v0 \
        --checkpoint-freq 5 --checkpoint-at-end --keep-checkpoints-num 3 \
        --experiment-name ${EXPERIMENT} --local-dir ${WORKDIR} \
	--config '{"num_workers": 4}'

Which outputs opaque “checkpoint” files:

rtamray_results/PPO$ ls -l PPO_RTAM-v0_fcf10_00000_0_2021-09-24_14-25-49/checkpoint_000015/
total 2820
-rw-rw-r-- 1 chunky chunky 2879800 Sep 24 14:27 checkpoint-15
-rw-rw-r-- 1 chunky chunky 181 Sep 24 14:27 checkpoint-15.tune_metadata

I’m also a little unsure how much extra “magic” there is going on; such that, even if I were able to get a .tflite file out, what additional steps it would take to execute that model and get real actions output [and if there’s any other weird layers that would need setting up, between the observation and the first layer of the tflite model?]


This is the entire class I currently am using for the forward pass:

class RayDream(DreamInterface):
    def __init__(self, dreamenv, checkpoint, navmode):
        rtam_config = with_common_config({
            "env": "RTAM-v0",
            "num_workers": 0
        self.navmode = navmode
        self.dreamenv = dreamenv
        self.agent = ESTrainer(rtam_config, env="RTAM-v0")

    def get_single_action(self, obs):
        action_step = self.agent.compute_action(obs)
        return action_step

    def get_one_path(self, loc, heading_deg, deadline_t=None, moved_players={}, tgts=None):
        obs = self.dreamenv.reset(loc, heading_deg, moved_players, tgts)
        curr_t = time.time()
        path = [self.dreamenv.loc]
        ep_complete = False

        while not ep_complete and (deadline_t is None or curr_t < deadline_t):
            action_step = self.agent.compute_action(obs)
            obs, reward, ep_complete, debug = self.dreamenv.step(action_step)
            curr_t = time.time()

        return path


I added a comment to a github issue that aligns with this question:

Any guidance would be greatly appreciated


gentle bump

I feel a little out of my depth with this; closing the loop on how to go from a trained RLlib checkpoint to a functioning model without the ray/rllib infrastrcture is still a fairly opaque process to me.