Passing dataset to Trainable

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I’m currently using Ray Tune with PyTorch Geometric, my goal is just to explore the hyperparameter space to see which hyperparameters would better fit my data and model.

The problem is that to setup this model I need to get the number of unique users in the dataset. The dataset is just a few MB, but it might get to about 2 GB.

class TrainLightGCN(tune.Trainable):
    def setup(self, config: dict):
        self.config = config
        self.model = LightGCN(
            num_nodes=...,
            embedding_dim=config['embedding_dim'],
            num_layers=config['conv_layers'],
        ).to(device)
        self.optimizer = torch.optim.Adam(self.model.parameters(), lr=config['learning_rate'])

What would be a good practice to populate this value?

  • Using a global variable??
  • Reading the dataset inside the setup function
  • Passing that value from the config

I’m not convinced of any of these options. There should be a better way, but I can’t find it in the FAQ. Furthermore, should I pass the whole data and divide between train/test outside or inside the trainable?

I knew there had to be a “ray way” of doing it.

This is what with_parameters is for

https://docs.ray.io/en/latest/tune/api/doc/ray.tune.with_parameters.html#