“env_config” in Tune

  • High: It blocks me to complete my task.

Hi all,
Sorry if my question is basic.

My environment has arguments for initialization,
like Myenv (arg1,arg2) .

How can I define “env_config” in Tune so that my model trains with different arg1 and arg2?

Thanks

Does this work for you?
https://docs.ray.io/en/latest/rllib/rllib-env.html#configuring-environments

thanks for the reply,
no, unfortunately. For examples in the link, environment’s arguments are fixed during the training. But I need to train my model with a different range of arguments.

Can you share the ctor of your environment? What are the things you wanna configure?

cc @kourosh

Thanks @xwjiang2010

This is my environment constructor:

class Energy_Trading(MultiAgentEnv):
    def __init__(self,config):
        self.tx_count=0
        self.buy_book = []
        self.sell_book = []
        self.fill_book= []
        self.transactions=[]
#         self.timesteps=0  #*******************
        self.dones = set() #*******************
        #self.buyer_demand={}
        #self.Capacity={}
        self.dissatisfaction=2
        self.total_timestep=50
        agents=[]
        p=config
        power= np.array(p)
        #power = power.astype('float32')
        self.l=power.size
        self.lp=(power>0).sum()
        self.ln=self.l-self.lp
        
        for i in range(0,self.l):
                agents.append(f"agent{i}")
                
        self.agents=agents
        
        self.dict_agents=dict(zip(self.agents,power))
        
        self.state=copy.deepcopy(self.dict_agents)
        
        
        obs=gym.spaces.Box(low=-1, high=1, shape=(1,))
       
        act=gym.spaces.Box(low=0, high=1, shape=(1,))
        
        
        
        self.observation_space = gym.spaces.Dict({key: obs for key in agents})
        
        self.action_space= gym.spaces.Dict({key: act for key in agents})
        
        self._agent_ids = set(self.agents)
        
        self._spaces_in_preferred_format = True
        super().__init__()

In each episode, my environment should get a new config. In fact, this config variable initializes part of my environment for each episode.

I train my model using these lines of codes:

tune.register_env("MyEnv", lambda config: Energy_Trading(p))


ray.init(local_mode=True)
tune.run(
    "PPO",
    stop={"episode_reward_mean": 200},
    config={
        "disable_env_checking": True,
        "env": "MyEnv",
        "env_config": {
            "config": p,
        },
        "num_gpus": 0,
        "num_workers": 1,
        "multiagent": {
            "policies": policies,
            "policy_mapping_fn": policy_mapping_fn,
            
        },
        "framework": "tf"
    },
    
)

I’m able to define the environment config once (by passing p to env_config).
But I want my model to be trained with different configs, meaning my code should pass different p to env_config.

Also, if I don’t use tune and use for loop instead. I have like this:

trainer = PPOTrainer(
config={
        #"observation_filter": "NoFilter",
        "disable_env_checking": True,
        "env": "MyEnv",
        "env_config": {
            "config": p,
        },
        "num_gpus": 0,
        "num_workers": 1,
        "multiagent": {
            "policies": policies,
            "policy_mapping_fn": policy_mapping_fn,
         
        },
        "framework": "tf"
    }
)

    for i in range(100):
		print("STARTING A NEW EPISODE:")
        result = trainer. Train()

So each episode uses the same config while I need my model trained with different configs in each episode.

I hope I was clear enough.

Hi @Mehdi it should be something like

config = {
    ....
    "env": Energy_Trading
    "env_config": {
            'arg1': arg1_value,
     }
}

This will be equivalent to constructing with Energy_Trading({'arg1': arg1_value})

Thanks @kourosh
That’s true,
in my problem arg1_value should change for each episode, but in this way, we just pass arg1_value to the environment once.

If that’s the case you should be using RLlib callbacks to modify a parameter within each episode. Trainer.train() also does not have an episode granularity.

2 Likes