How to solve a problem that needs shielding action and has continuous and discrete mixed action space

ReCodeLife · July 1, 2021, 4:39pm

I want to solve a problem that needs shielding action and has continuous and discrete mixed action space, I can find a code that can mask some discrete action.This is done by adding -inf to the action distribution of the shield.

But I don’t know how to mask continuous action, The above method doesn’t work.Maybe it’s easy.Can I get some effective suggestions from here？ My English is not very good, please forgive me if I have offended you.

Here is an environmental case that needs shielding action and has continuous and discrete mixed action space.
Can I use PPO to solve a problem?

import random
import gym
import gym.spaces
import numpy as np
import traceback
import pprint

class GridEnv1(gym.Env):
    '''

    A line segment of length 10,

The starting point is 0 and the ending point is 10,

The number of times starts from 0

Even: move one unit length left or right

Odd: you can move [- 10, 10] to the left or right

If you reach the known interval between [9,10], you will win, + 100

If you take more than 10 steps, you will die - 100. If you don't take [0, 10], the game will end - 100
    '''



    def __init__(self,env_config=None):
        self.isOdd=0
        # 
        self.action_space=gym.spaces.Dict({
            'even':gym.spaces.Box(-10,10,shape=(1,)),
            'odd':gym.spaces.Discrete(2)
        })
        self.observation_space=gym.spaces.Dict({
            'postion':gym.spaces.Box(-10,20,shape=(1,)),
            # 'isOdd':gym.spaces.Discrete(2)
            'action_mask':gym.spaces.Box(0,1,shape=(4,))
        })
        self.reset()
        #gym.spaces.dict
        #t=gym.make('MountainCarContinuous-v0')
        #t



    def reset(self):
        '''

        :return: state
        '''
        # 0-chose discrete 1-chose cont
        self.observation = {
            'postion':[0],
            'action_mask':np.array([1,1,0,0])
        }
        #self.reward = 10
        self.done=False
        self.step_num=0
        self.isOdd=1
        return self.observation

    def step(self, action)->tuple:
        '''

        :param action:list
        :return: tuple ->[observation，reward，done，info]
        '''
        #pprint.pprint(traceback.extract_stack())

        self.step_num += 1
        reward = -1.0
        if self.isOdd == 0:
            actionval=action['even'][0]
        else:
            actionval = action['odd'] if action['odd']==1 else -1

        self.observation['postion'][0] += actionval
        self.isOdd=1-self.isOdd
        self.observation['action_mask']=1-self.observation['action_mask']


        #
        if self.step_num > 10 or self.observation['postion'][0] < 0 or self.observation['postion'][0] > 10:
            reward -= 100.0
            self.done = True
            # print('last %d action %d now %d' % (self.observation[0] - action, action, self.observation[0]))
            # print(self.done)
            return self.observation, reward, self.done, {}

        if self.observation['postion'][0] >= 9 and self.observation['postion'][0] <= 10:
            reward = 100.0
            self.done = True
        # print(self.done)
        # print('last %d action %d now %d'%(self.observation[0]-action,action,self.observation[0]))
        return self.observation, reward, self.done, {}





    def render(self, mode='human'):
        pass

mannyv · July 1, 2021, 11:15pm

Hi @ReCodeLife,

I am still thinking through this but looking at your environment I would think you might want to include isodd as part of the observation since it has a direct effect on the actions.

ReCodeLife · July 1, 2021, 11:50pm

Thank you for your reply. You’re right.

I’ve introduced a variable ‘mask_action’.
This variable is used to indicate the validity of the action.It’s functionally the same thing.Of course, I can also introduce this variable, which is easy.

Now, I want to know how to mask the continuous action.It bothers me.

In real life, there are many such similar circumstances. For example, in LOL, the hero can’t cast his skill when it’s cooled down. In other words, the action is blocked.

Maybe you have some experience or you know about the papers. If so, give me some advice. Thank you very much.My English is not very good, please forgive me if I have offended you.

ReCodeLife · July 2, 2021, 2:09am

Maybe I forgot to @ you. @mannyv

Topic		Replies	Views
Action masking of continuous actions RLlib	2	344	January 13, 2025
Observation dependent continuous action space ("Masking" continuous action space) RLlib	4	1106	February 9, 2022
Problem with action masking RLlib	7	2218	May 19, 2022
Discrete and Continuous actions for each step RLlib	5	664	October 20, 2022
Action masking for dependent multi discrete space Configure Algorithm, Training, Evaluation, Scaling	0	471	August 3, 2023

How to solve a problem that needs shielding action and has continuous and discrete mixed action space

Related topics