Using a state embedding with PPO

I have a GNN component in my custom model that creates an embedding for my state. I want to train an RL algorithm (ideally PPO) on the embedding rather than the state. I read a paper that was using embeddings as a state and using PG. I think it is not easily possible to directly define such a state when using gym.Env and have the GNN trained in an end to end fashion.

I see that the current ray RLlib dev version seems to have changed the training part of the PPO agent which might make it easier to modify, but guidance would be greatly appreciated.

Hey thanks for the question.
I think it basically comes down to encoding your state somehow in the observation array (into a numpy array), and then unpack it into pytorch geometric format in the policy.
and then use the unpacked data to run and train the model.
@smorad1 has done this before.

1 Like

Thanks for your answer.

Any idea how to encode the state or access the same model in the gym env? You need to somehow have access to the same model in the policy and then I assume call it using with no_grad. I think it might be easier to just implement it from scratch without using RLlib.

I have seen @smorad1 's graph-conv-memory-paper repo linked here before, but looking at it somewhat quickly I assume this is not the time they did that, right?