Using a trained RL model with TFLite?

My real endgame is running a trained RL model on a Coral [from here: https://coral.ai/ ; I have the USB variant, and it’s working fine with other things ].

As far as I can tell, the first step is to convert the model to TFLite, so that it can be compiled down to something to run on the Coral [using this tool: Edge TPU Compiler | Coral ]

So, am I chasing silly ideas? Or is there a good path from a trained RLLib model, to TFLite? Or is there some completely-other approach I’d be better off using?

Cheers!
Gary

I’d imagine yes. What format is your exported RLlib model in right now?

Currently I’m just using the command line tool “rllib”:

#!/bin/sh
ALG=PPO
EXPERIMENT=${ALG}
WORKDIR=rtamray_results

rllib train --run ${ALG} --env RTAM-v0 \
        --checkpoint-freq 5 --checkpoint-at-end --keep-checkpoints-num 3 \
        --experiment-name ${EXPERIMENT} --local-dir ${WORKDIR} \
	--config '{"num_workers": 4}'

Which outputs opaque “checkpoint” files:

rtamray_results/PPO$ ls -l PPO_RTAM-v0_fcf10_00000_0_2021-09-24_14-25-49/checkpoint_000015/
total 2820
-rw-rw-r-- 1 chunky chunky 2879800 Sep 24 14:27 checkpoint-15
-rw-rw-r-- 1 chunky chunky 181 Sep 24 14:27 checkpoint-15.tune_metadata
chunky@gbriggs-desktop:~/src/rtam_openai/agents/ray/rtamray_results/PPO$

I’m also a little unsure how much extra “magic” there is going on; such that, even if I were able to get a .tflite file out, what additional steps it would take to execute that model and get real actions output [and if there’s any other weird layers that would need setting up, between the observation and the first layer of the tflite model?]

Thanks,
Gary

This is the entire class I currently am using for the forward pass:

class RayDream(DreamInterface):
    def __init__(self, dreamenv, checkpoint, navmode):
        rtam_config = with_common_config({
            "env": "RTAM-v0",
            "num_workers": 0
        })
        self.navmode = navmode
        self.dreamenv = dreamenv
        self.agent = ESTrainer(rtam_config, env="RTAM-v0")
        self.agent.restore(checkpoint)

    def get_single_action(self, obs):
        action_step = self.agent.compute_action(obs)
        return action_step

    def get_one_path(self, loc, heading_deg, deadline_t=None, moved_players={}, tgts=None):
        obs = self.dreamenv.reset(loc, heading_deg, moved_players, tgts)
        curr_t = time.time()
        path = [self.dreamenv.loc]
        ep_complete = False

        while not ep_complete and (deadline_t is None or curr_t < deadline_t):
            action_step = self.agent.compute_action(obs)
            obs, reward, ep_complete, debug = self.dreamenv.step(action_step)
            path.append(self.dreamenv.loc)
            curr_t = time.time()

        return path

Cheers!
Gary

I added a comment to a github issue that aligns with this question:

Any guidance would be greatly appreciated

Gary

gentle bump

I feel a little out of my depth with this; closing the loop on how to go from a trained RLlib checkpoint to a functioning model without the ray/rllib infrastrcture is still a fairly opaque process to me.

Thanks,
Gary

I already added a comment to above-mentioned Github PR, but for anyone reading this thread, here is the link to a tool that transforms RLlib checkpoint to an ONNX model and can run inference using ONNX runtime: GitHub - airboxlab/rllib-fast-serve: Tools and examples to export policies trained with Ray RLlib for lightweight and fast inference

Thanks for that @antoine-galataud !