ERROR: Check failed: resource_pair.second > 0

josie666 · October 14, 2021, 9:52pm

I was using Ray tune combined with Pytorch-lightning to fine tune my model.
after running

analysis = tune.run(
    trainable,
    resources_per_trial={
        "cpu": 1,
        "gpu": gpus
    },
    metric="loss",
    mode="min",
    config=variable_config,
    num_samples=num_samples, 
    name="tune_model")

it errors out showing

2021-10-14 21:14:43,985 - ray.tune.tune - INFO - Initializing Ray automatically.For cluster usage or custom Ray initialization, call `ray.init(...)` before `tune.run`.
2021-10-14 21:14:44,230 WARNING services.py:1739 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67108864 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=10.24gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2021-10-14 21:14:45,436 WARNING function_runner.py:559 -- Function checkpointing is disabled. This may result in unexpected behavior when using checkpointing features or certain schedulers. To enable, set the train function arguments to be `func(config, checkpoint_dir=None)`.
[2021-10-14 21:14:45,897 C 8403 8403] scheduling_resources.cc:35:  Check failed: resource_pair.second > 0 
*** StackTrace Information ***
    ray::SpdLogMessage::Flush()
    ray::RayLog::~RayLog()
    ray::ResourceSet::ResourceSet()
    ray::BundleSpecification::ComputeResources()
    ray::PlacementGroupSpecification::ConstructBundles()
    ray::core::CoreWorker::CreatePlacementGroup()
    __pyx_pw_3ray_7_raylet_10CoreWorker_57create_placement_group()
    _PyMethodDef_RawFastCallKeywords

Could you help me with that? I really don’t know what’s wrong

kai · October 18, 2021, 11:36am

Interesting, I haven’t seen this before - cc @sangcho can you help?

sangcho · October 18, 2021, 11:47pm

What’s the value in gpus? @kai Is it the integer?

I also don’t recall I’ve seen this issue before. Is it happening in the master @josie666? What version of Ray are you using?

Topic		Replies	Views
[tune] AttributeError Ray Tune	9	1336	September 1, 2022
ValueError: ctypes objects containing pointers cannot be pickled Ray Tune	2	3327	November 21, 2022
Possibly Checkpoint error while running Ray tune	4	1231	December 2, 2022
Ray Tune Error Help Ray Tune	1	349	October 26, 2021
Not fully used resources by ray tune Ray Tune	2	403	August 11, 2021

ERROR: Check failed: resource_pair.second > 0

Related topics