Greetings to the community!!
I am trying to grid search some parameters of my training function using ray tune.
The input data to train_cifar() used for training and testing are 2 lists of dimensions
400x13000 and 40x13000, respectively.
Due to size I cannot produce a reproducible example, but below I show three different
ways I have tried to ray tune my model.
In each case I receive the following error:
The actor ImplicitFunc is very large (95 MiB). Check that its definition is not implicitly
capturing a large array or other object in scope. Tip: use ray.put() to put large objects
in the Ray object store.
or this one:
debug_error_string = “{“created”:”@1643300850.335447653",“description”:
“Error received from peer ipv4:172.28.0.2:45437”, “file”:“src/core/lib/surface/call.cc”, “file_line”:1074, “grpc_message”: “Received message larger than max (137418486 vs. 104857600)”,“grpc_status”:8}"
I don’t understand what the limit of 95 MiB is since my lists are really small.
Any ideas of what am I doing wrong?
I am running the following codes to google’s Colab.
Kostas
CODE I
def train_cifar(config, data = None, checkpoint_dir=None):
X_scaled_train_tmp = config["data1"]
X_scaled_train2 = ray.get(X_scaled_train_tmp)
X_scaled_test_tmp = config["data2"]
X_scaled_test2 = ray.get(X_scaled_test_tmp)
def tunerTrain():
config = {
"data1" : X_scaled_train1,
"data2" : X_scaled_test1,
}
scheduler = ASHAScheduler(
...
)
reporter = CLIReporter(
...
)
result = tune.run(
partial(train_cifar, data_dir=data_dir),
...
)
tunerTrain()
CODE II
X_scaled_train = ...
X_scaled_test = ...
ray.init()
X_scaled_train1 = ray.put(X_scaled_train)
X_scaled_test1 = ray.put(X_scaled_test)
def train_cifar(config, data = None, checkpoint_dir=None):
X_scaled_train2 = ray.get(data[0])
X_scaled_test2 = ray.get(data[2])
def tunerTrain():
config = {
...
}
scheduler = ASHAScheduler(
...
)
reporter = CLIReporter(
...
)
result = tune.run(
tune.with_parameters(train_cifar, data=[X_scaled_train1, X_scaled_train_trait,
X_scaled_test1, X_scaled_test_trait]),
...
)
tunerTrain()
CODE III
X_scaled_train = ...
X_scaled_test = ...
def train_cifar(config, data = None, checkpoint_dir=None):
X_scaled_train2 = data[0]
X_scaled_test2 = data[2]
def tunerTrain():
config = {
...
}
scheduler = ASHAScheduler(
...
)
reporter = CLIReporter(
...
)
result = tune.run(
tune.with_parameters(train_cifar, data=[X_scaled_train, X_scaled_train_trait,
X_scaled_test, X_scaled_test_trait]),
...
)
tunerTrain()