I would like to use RL to solve permutation problem where my input is sparse zeros-ones 3D array.

This 3D array defines cloud of points in 3D space, the array is large 1000x500x50 and sparse number of ones is below 1 % off the array.

My problem is to prepare school timetable where:

x-axis is timeslot

y-axis is room

z-axis is subject

so RL alrorithm try to get current timetable, next to place value 1 in proper timeslot-room-subject index, this position defines time, room and subject in timetable.

In this problem thereare very large observation array and very large action space 1000x500x50 actions , each possible action is the position of single lesson in school timetable (some of inputs are masked).

To solve it I use typical rllib PPO model with action masking, of course process is very slow and consumes a lot of memory for the model. I wonder if I can use 3D convolutions or attention, but:

- 3D convolutions need a lot of CPU
- attention - I don’t know how to properly calculate similarity between 3D objects, I don’t know proper metric for this purpose ( of course I can flat 3D array and use cosine or dot similarity but I think it’s not correct direction)
- use some feature extraction from 3D cloud to decrease size of input array but I don’t know proper method for that, typical feature extractors are for images.

I will be grateful for any suggestions or ideas.