If a cluster is running, how can ray shut down the node without losing the object stored in object store memory? Is there any special strategy? could you show me where the code is? thanks!
Generally speaking if you lose a node, you’ll lose the objects on that node. Generally speaking, the autoscaler usually shuts nodes down, and it won’t do that if it detects primary copies of objects on the node.
Note that depending on your use case you should also check
(1) Can you reconstruct that object?
(2) Is there library level fault tolerance or checkpointing support that handles the issue for you?