I have a RL algorithm/model that use TF2.1 model.
Generally this is a tf.keras.Model subclassed class which implements the call methods. Inputs are observations and actions and output is huge dict with outputs such as latent variables, however NO ACTIONS.
In my tf2 training loop I currently only sample actions uniformly using the .sample() from the gym wrapper.
My goal is to run my algorithm to generate these latent variables and to use already implemented policies to derive actions (i.e DQN, PPO), where this is adjusted as a hyperparameter.
So yeh, to summarize, i have this cool model that i wish to integrate with the rllib ecosystem, without having my tf 2.0 model depend on rllib. I’m having troubles finding proper documentation on how to proceed or a general recommendation if this is beneficial for my use case.
So my use case: generally I would like to
- run many experiments across many Environments,
- use tune for hparam search
- Run the two above in paralell (on mand aggregate the results to tensorboard or a similar tool
Ive also looked into using tune only, but i then could not find any good way of utilizing the envs in rllib.
Can anyone push me in the right direction here,?
Also, I’ve used quite a while trying to find a good method to accumulate batches of observation sequences, I have not found a good way of doing this, but perhsps there is some example code somwhere?
As mentioned, my model is a tf2 keras model, and i would preferably run it in graph mode, however, during initialization, the dummy data inserted had unknown batch size, in my case I need the correct batch size during setup (its known so i dont understand why it has ? During init.