Is there no option to train SAC with a convolutional network?

Hi,

I noticed that the SAC agent does not appear to have the ability to be trained with a convolutional network. Attached is a screenshot of the default config. I noticed that there are two models here to specify, Q_model and policy_model, but I do not see a way to add a conv_filters option. Am I making a mistake?

CONFIG="""{

 'policy_model':{'dim':12,
          'conv_filters':[[16,[4,4],1],
                         [32,[3,3],2],
                          [512,[6,6],1]]
       },
'Q_model':{'dim':12,
         'conv_filters':[[16,[4,4],1],
                         [32,[3,3],2],
                          [512,[6,6],1]]
         }
 }"""

Link to SAC
https://docs.ray.io/en/master/rllib-algorithms.html#soft-actor-critic-sac

Thanks,
Sam

I Samuel,

How do you pass the config you posted to RLlib?
The code you posted does not look like RLlibs code.
Because generally, you should be able to train any form of neural network with RLlib and SAC, provided it has the appropriate input and output shape.

Cheers

Hi @arturn,

The code I am running augments Rllib for specific research purposes, but it does not alter any of the source code for how the config is ingested by RLlib (or the training, for that matter). I need to push a deadline through tonight, but tomorrow I will provide a reproducibility script for this with a general RLlib CLI command. I have used this tool for at least 8 other agents RLlib offers and none of them show this kind of error when defining the model configuration.

Thanks,
Sam

Maybe the CONFIG object gets put into the model field in the ray.tune() call?

The following code works for me:

import ray
from ray import tune

if __name__ == "__main__":
    ray.init()

    config = {
        "env": "Pendulum-v0",
        "Q_model": {
	        "conv_filters": [[16, [4, 4], 1],
	                         [32, [3, 3], 2],
	                         [512, [6, 6], 1]],
	    },
	    "policy_model": {
	        "conv_filters": [[16, [4, 4], 1],
	                         [32, [3, 3], 2],
	                         [512, [6, 6], 1]],
	    },
        "num_workers": 0,
        "framework": "torch",
    }

    stop = {
        "training_iteration": 1,
    }
    
    tune.run("SAC", stop=stop, config=config, verbose=2)

    ray.shutdown()
1 Like