Training issues with MultiworkerMirroredStrategy

RunLLM · November 25, 2025, 4:19pm

Ray Train’s distributed TensorFlow integration with MultiWorkerMirroredStrategy is not deprecated, but users have reported issues running official examples across multiple nodes, especially with recent TensorFlow and Keras versions. The error you encountered—“unsupported type (<class ‘tensorflow.python.distribute.values.PerReplica’>) to a Tensor”—is a known TensorFlow issue when handling distributed datasets, often related to how data is sharded or how the dataset is constructed in a distributed context. This problem does not occur on a single node, which matches your experience. According to the Ray distributed TensorFlow guide, Ray should handle TF_CONFIG and worker setup, but compatibility issues with newer TensorFlow/Keras versions (e.g., Keras 3.x) can break these examples, as discussed in community threads.

Additionally, distributed training with Ray Train and TensorFlow is only beneficial for large datasets or when scaling across multiple GPUs/nodes outweighs the communication overhead. For small datasets or single-node setups, distributed training can be slower or unnecessary, as noted in user discussions. If you are using recent TensorFlow/Keras versions, you may need to set os.environ["TF_USE_LEGACY_KERAS"] = "1" as a workaround, but this is not ideal for all users. Would you like a step-by-step breakdown of how to debug or adapt your code for multi-node distributed training with Ray and TensorFlow?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Ray train with tensorflow Ray Train	0	46	November 15, 2024
Example in ray.train for tensorflow distributed training?	1	53	November 24, 2024
[Tune] ray tune with each trial running tf.distribute.experimental.MultiWorkerMirroredStrategy()	1	316	August 19, 2022
Tensorflowtrainer train way slower than (normal pandas and tensorflow) Ray Train	1	593	April 12, 2023
Tensorflow and Pytorch cannot distributed training Ray Data	6	216	February 28, 2024

Training issues with MultiworkerMirroredStrategy

Related topics