I am using ray tune for optimizing some deep learning model.
I am currently getting an error like:
TemporaryActor pid=90906) Traceback (most recent call last):
(TemporaryActor pid=90906) File "/Users/luca/opt/anaconda3/envs/mlmod/lib/python3.9/site-packages/ray/_private/function_manager.py", line 594, in _load_actor_class_from_gcs
(TemporaryActor pid=90906) actor_class = pickle.loads(pickled_class)
(TemporaryActor pid=90906) ModuleNotFoundError: No module named 'mlmod'
mlmod
is the package module. I had had similar setup before with optimizing time series models and that always worked.
So, my code is something like:
ray.init(ignore_reinit_error=True)
result = tune.run(
tune.with_parameters(train_model, data=data, hydra_config=config, hydra_state=state),
resources_per_trial=resources_per_trial,
config=search_config,
num_samples=num_samples,
metric="loss",
mode="min",
scheduler=scheduler,
# TODO: We will probably need to add this if we run ray on the cloud.
# sync_config=tune.SyncConfig(upload_dir="s3://something"),
resume="AUTO",
)
def train_model(ray_config, data, hydra_config: DictConfig, hydra_state: Any):
# required to avoid https://github.com/facebookresearch/hydra/issues/903
Singleton.set_state(hydra_state)
# map ray tune parameters to hydra parameters
for param, value in ray_config.items():
OmegaConf.update(hydra_config, param, value, merge=False)
from mlmod.apps.train import train
loss = train(hydra_config, None)
tune.report(loss=loss)
and the called train function at the moment, just does:
// file: mlmod/apps/train.py
def train(config: DictConfig, datamodule: LightningDataModule) → None:
import numpy as np
return np.random.random()
I do not quite understand what is being serialized here and why this issue is happening. I am at a loss now what I can try and how to debug this.