Offline reinforcement learning without environment

fst · February 24, 2023, 7:31pm

I have offline data that contains training patterns with:

recordings of inputs from a physical machine
actions of a behavior policy (=a person) operating the machine
rewards for the performance of that behavior policy for each time step

I have now converted all these (episodes, actions and rewards) into a JSON file via the SampleBatchBuilder.
Now I am wondering - why do I still have to provide an environment, even if I set “explore=false”?
Wouldn’t the offline dataset contain all information needed for offline off-policy training?

Wrapping my data into an environment would be quite artificial, because the step function would only work for actions that the behavior policy (=person) has performed in the recordings.
Training a realistic surrogate model isn’t very feasible for the given data.

Why are the trainers for offline RL insisting on an environment?

joshml · March 4, 2023, 2:20pm

Hey - I’m pretty new to rllib but you still need to specify the env as the algorithms query the env object for reasons other than obtaining the data i.e., the CQL model checks that the environment is not discrete. I just create a very lightweight env (see below) and make sure the “disable_env_checking” environment option is set to True.

state_space =  Box(
    low=np.zeros(len(state_vars_list)),
    high=np.ones(len(state_vars_list)), 
    dtype=np.float32)

class MyEnv(gym.Env):
    def __init__(self, env_config):
        self.action_space = Discrete(2)
        self.observation_space = state_space

def env_creator(env_config):
    return MyEnv(env_config=env_config)

register_env("MyEnv", env_creator)

fst · March 14, 2023, 5:34pm

Hi @joshml, thanks for your feedback and sorry, forgot to check back at the thread.

Found a viable solution - one can provide the entries “observation_space” and “action_space” in the environment config of the offline algorithm. That way one doesn’t have to create an artificial environment, just to provide the space definitions.

E.g.:

config = CRRConfig()
config = config.offline_data(  
    input_ = "my_offline_samplebatches.json"
)
config = config.environment(
    observation_space = gym.spaces.Dict({
	    'obs': gym.spaces.Box(low=-1, high=1, shape=(47*336,))
    }),
    action_space = gym.spaces.Discrete(24),

# [...]

Imho having this in the environment sub-config is a little confusing, because in this case it is more a specification of the offline dataset than the environment.
I think this is related to the circumstance that in most toy examples (and all online tutorials I could find) the offline data is sampled from an existing model based environment. Which doesn’t make too much sense, because if I have an environment and thus a model, I could use much more effective on-policy approaches that make use of being able to explore that model.
Think a more realistic tutorial where offline data is really coming from e.g. a CSV file that was recorded from the sensors of a physical machine would be much closer to the situation a typical offline RL user would face.
But long story short - it works that way!

Edit: Regarding the proposal to create a dummy environment. This also works, but one apparently has to create additional artificial step and reset functions that return an observation in the specified format, otherwise the environment checker will complain. So unless one can really implement a meaningful step function (so one can use pre-recorded data in combination with freshly sampled exploration data), it seems easier to just set the observation and action spaces in the algorithm config.

Monsieur_Wave · November 29, 2023, 1:36pm

Stumbling on your post whilst working on a very similar problem: offline learning from a csv file with real world observations

Could you quickly summarize how you made it work in the end?

Topic		Replies	Views
Offline RL passing reward data from .json into environment Offline RL	3	513	September 19, 2023
Offline data with self made dataset RLlib	1	266	June 7, 2023
RNN support for offline algorithms RLlib	5	711	February 1, 2022
Offline RL with DQN, PPO, etc Offline RL	0	322	November 5, 2023
Roll out CQL policy RLlib	8	644	November 25, 2021

Offline reinforcement learning without environment

Related topics