Hi ,
Iam getting the following error .
ValueError: When dataset is sharded across workers, please specify a reasonable steps_per_epoch
such that all workers will train the same number of steps and each step can get data from dataset without EOF. This is required for allreduce to succeed.We will handle the last partial batch in the future.
My training dataset is 43,846 records. batch size is 128, steps_per_epoch is 342.
This is my model
model = Sequential()
model.add(Dense(10, input_shape=(10,), activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
If I set num_replicas greater than 1 inside TFTrainer i see this error. Any idea how to solve this problem. Thanks in advance.