All ray resources mapped to only two physical processors

hi,

i am using ray 2.8.1 with a single agent rl environment, torch ModelV2 and PPO algorithm. my problem is the following:
I can specify resources (cpu’s in this case) that ray is allowed to use in ray.init().
I can specify resources in the PPOConfig, and if I understand correctly, that specifies the resources per trial, e.g. num_cpus_per_worker and num_cpus_local_worker.
If I run a tune.Tuner(…).fit(), the resources specifications are respected, f.e. I set ray.init(num_cpus=12), num_cpus_for_local_worker=4, num_rollout_workers=0, it runs, as expected, 3 trials in parallel. Another example, for ray.init(num_cpus=12), num_cpus_for_local_worker=1, num_rollout_workers=1, num_cpus_per_worker=1, it runs 6 trials in parallel.
when I inspect the cpu usage with htop though, all trials are executet on the same two physical cores, splitting up the cpu% between each other (see picture)

How can I configure ray tune to distribute the load on all available physical resources? or is this something I have to handle with the cluster people?

This is all configuration I’m doing, all the config_files are related to the application itself. let me know if I should provide more info.

    ray.init(num_cpus=12)
    
    tune.register_env("CommunicationV1_env", lambda env_config: CommunicationV1_env(env_config))
    tunable_model_config = ...
    model = {"custom_model": GNN_PyG,
            "custom_model_config": tunable_model_config}

    # ppo config
    ppo_config = (
        PPOConfig()
        .environment(
            "CommunicationV1_env", # @todo: need to build wrapper
            env_config=env_config)
        .training(
            model=model,
            _enable_learner_api=False,
        )
        .rollouts(num_rollout_workers=0)
        .resources(
            num_cpus_per_worker=2,
            num_cpus_for_local_worker=4,
            placement_strategy="PACK",
        )
        .rl_module(_enable_rl_module_api=False)
    )

    # run and checkpoint config
    run_config = air.RunConfig(
        name=run_name,
        stop={"timesteps_total": tune_config["max_timesteps"]}
    )

    # tune config
    tune_config = tune.TuneConfig(
            num_samples=tune_config["num_samples"]
        )

    tuner = tune.Tuner(
        "PPO",
        run_config=run_config,
        tune_config=tune_config,
        param_space=ppo_config.to_dict()
    )

    tuner.fit()

Hi, did you manage to solve the problem? I’m having the same issue: cores 1 and 17 are at 100%, while the rest are idle.

This issue—where only a subset of CPU cores (e.g., cores 1 and 17) are fully utilized while others remain idle—often occurs when Ray tasks are not parallelized as expected, or when the underlying code is single-threaded or limited by environment variables controlling thread usage. In particular, if you are using libraries like PyTorch or NumPy, you may need to explicitly set the number of threads (e.g., with torch.set_num_threads(num_cpus)) to ensure all cores are used. Otherwise, Ray may schedule tasks across all CPUs, but the actual computation only uses a single core per task, leading to underutilization of the rest of the CPUs. This was confirmed as the root cause in a similar case, where setting torch.set_num_threads(num_cpus) resolved the issue on VMs, while leaving it at the default value caused only one core to be used despite Ray’s resource allocation settings. See the discussion for more details: Usage of CPU resource on RayCluster GCloud.

Would you like a step-by-step guide on how to diagnose and fix this in your environment?

Sources:

Hint: Mention @RunLLM in the post for followups.