Combining image and continious observation

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I’m currently working on using a DQN algorithm to train an autonomous vehicles using Carla.
I am looking to combine an image based observation with continuous observations. I am currently using a Dict space having two Box spaces.

    def get_observation_space(self):
        """
        Set observation space as location of vehicle im x,y starting at (0,0) and ending at (1,1)
        :return:
        """
        spaces = {
            'values': Box(low=np.array([0,0,0,0,0,0,0]), high=np.array([1,1,1,1,1,1,50]), dtype=np.float32),
            'depth_camera': Box(low=0, high=256,shape=(240,320,3), dtype=np.float32),
        }
        
        obs_space = Dict(spaces)

        return obs_space

In this case,

  1. Is the image observation flattened and therefore no meaningful information can be extracted by the algorithm?

All the examples I have looked at only make use of an image observation, feeding it directly to the algorithm using a Box space.

    def get_observation_space(self):
        """
        Set observation space as location of vehicle im x,y starting at (0,0) and ending at (1,1)
        :return:
        """
        obs_space = Box(low=0, high=256,shape=(240,320,3), dtype=np.float32)

        return obs_space

In this case,
2. How does a DQN algorithm process an image observation?
3. Can I customize how the algorithm processes the image observation?

Any help is appreciated and thank you for your time!

Hi @Daniel_Attard,

Welcome to the forum.

This should be supported by rllib out of the box.
All you need to do is provide either a Tuple or Dict observation space where some of the inputs have 3 dimensions.

This is the torch version of the model that handles this setup. Rllib will choose it automatically given the appropriate observation space.

ray/complex_input_net.py at master · ray-project/ray · GitHub

There is a tf version as well.

You can customize shapes, and sizes and number of layers with the config.

If you want to customize how the model is constructed or behaves then you can create a custom model.

Hi @mannyv

My observation space is
Dict(image: Box(0,1,(64,64,5)), vector: Box(-1,1,(6,)))
but when I try to run it with SACConfig, ray complains no default encoder config for this obs space.

I am new to rllib. and the docs are just too big to navigate. Can you please point me to the direction where i can solve this?

Hi @iykim ,

If you are using the new API, you probably encountered the same issue as me, you can look up my post here and the “hacky” solution I proposed: Using Dict observation space with custom RLModule - #6 by adrienJeg

1 Like

Thanks. Based on your approach I actually tried to subclass the encoder configs and used SAC. But i had to change some code inside the original rllib SAC Algorithm code. It’s working pretty well now! Thank you for your time and help.