I was using Ray tune combined with Pytorch-lightning to fine tune my model.
after running
analysis = tune.run(
trainable,
resources_per_trial={
"cpu": 1,
"gpu": gpus
},
metric="loss",
mode="min",
config=variable_config,
num_samples=num_samples,
name="tune_model")
it errors out showing
2021-10-14 21:14:43,985 - ray.tune.tune - INFO - Initializing Ray automatically.For cluster usage or custom Ray initialization, call `ray.init(...)` before `tune.run`.
2021-10-14 21:14:44,230 WARNING services.py:1739 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67108864 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=10.24gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2021-10-14 21:14:45,436 WARNING function_runner.py:559 -- Function checkpointing is disabled. This may result in unexpected behavior when using checkpointing features or certain schedulers. To enable, set the train function arguments to be `func(config, checkpoint_dir=None)`.
[2021-10-14 21:14:45,897 C 8403 8403] scheduling_resources.cc:35: Check failed: resource_pair.second > 0
*** StackTrace Information ***
ray::SpdLogMessage::Flush()
ray::RayLog::~RayLog()
ray::ResourceSet::ResourceSet()
ray::BundleSpecification::ComputeResources()
ray::PlacementGroupSpecification::ConstructBundles()
ray::core::CoreWorker::CreatePlacementGroup()
__pyx_pw_3ray_7_raylet_10CoreWorker_57create_placement_group()
_PyMethodDef_RawFastCallKeywords
Could you help me with that? I really don’t know what’s wrong