Specifying schema using from_numpy()

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

When creating a dataset from a list of NumPy arrays, is it possible to specify the schema alongside? Each Numpy array ([a,b,...]) is a variable-sized array ([N,size_1], [N, size_2],…), and I’d like to yield batches of each array as a dict DIct[str, np.ndarray] during training in the same way as the linked example in the 2.3.0. docs for consuming multi-column tensor data: ML Tensor Support — Ray 2.3.0

Hey @Justin-Tan,

Thanks for posting your question.

What would specifying the schema let you do? Is it so that you can yield batches of dict[str, np.ndarray] instead of np.ndarray?

Also, do you want to yield dicts with multiple key? If so, what would that look like?