Improve and verify the performance of code on Ray

SumanthDatta · March 3, 2021, 12:30pm

I have a section of code , which my peers think it doesn’t execute parallelly across multiple ray nodes. Below is the pasted code, which fetches data from a database, data is of type MLDataset. MLDataset is a class from ray.util.data.Dataset.py . After fetching it I am iterating using async method from ray.util.iter , then splitting it into training and testing data. Converting into tensor slices using TensorFlow API and then feeding it into Ray’s TFTrainer class.Tf Trainer class accepts tensor dataset only. So the requirement here is to improve the code after the first line and verify parallelism across multiple nodes. I can share the whole code , any help is appreciated.

import tensorflow as tf
import ray
from ray.util.sgd.tf.tf_trainer import TFTrainer, TFTrainable
from sklearn.model_selection import train_test_split

 def fetch_values_from_database():
     custom_dataset = <Custom Method of return type MLDataset.from_parallel_it >
      resultList = []
      for df in custom_dataset .gather_async():
            for value in df.values:
                resultList += [[va for va in value]]
      resultColumn = [value[-1] for value in resultList]
      trainColumns = [value[3:-1] for value in resultList]
      X_train, X_test, y_train, y_test = train_test_split(trainColumns, resultColumn, 
      test_size=0.20, shuffle=True)
      train_dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train)).batch(32)
      test_dataset = tf.data.Dataset.from_tensor_slices((X_test, y_test)).batch(32)
      return train_dataset, test_dataset


trainer = TFTrainer(
        model_creator=<invoke a method which returns TF model>,
        data_creator=fetch_values_from_database,
        verbose=True
        )

I developed a full working example based on this link Distributed TensorFlow — Ray v2.0.0.dev0

Topic		Replies	Views
Issue in Ray dataset sharding	12	1106	October 15, 2022
Ray equivalent API for train_test_split Ray Core	5	379	March 15, 2021
TFRecordDataset -> ray.data.Dataset for TensorflowTrainer Ray Data	7	1235	August 12, 2022
Ray Trainer looking for more CPU's than that of its initialized on Ray Train	1	725	September 27, 2022
Issue in iterative training of Tensorflow Model with Ray Ray Train	1	398	November 16, 2022

Improve and verify the performance of code on Ray

Related topics