First of all, thanks for a great project
I’m having some issues with Memory Aware Scheduling on a cluster with multiple node types.
I’m not familiar with the ray code, but the issue seems to be that the memory
part of the resources
attribute in available_node_types
is being interpreted with different units in different parts of the ray code.
This is my understanding of how it works now:
python/ray/autoscaler/_private/resource_demand_scheduler.py
uses the 50 MB unit, while python/ray/node.py
which gets it through RAY_OVERRIDE_RESOURCES
interprets it as in bytes.
The RAY_OVERRIDE_RESOURCES
is set in python/ray/autoscaler/_private/updater.py
from the node_resources
attribute of NodeUpdater
.
Thanks in advance for any help.