Hi i would like to train an actor for a custom enviroment using transformer like policy net. I came across the GTrXL net and would like if this supports multimodal input to the net? By this i mean a tokenizer for visual features (images) and perceptual features (joint states, etc.)
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Does the GTrXL Model supports dict/tuple observation? | 0 | 89 | April 16, 2024 | |
| Do multi-agent RL custom modules support GTrXLNet | 0 | 161 | December 21, 2023 | |
| Valid inputs for `state`, `seq_lens` in GTrXLNet | 2 | 319 | December 8, 2023 | |
| Masked GTrXLNet | 0 | 323 | December 8, 2023 | |
| Environment error: ValueError: The two structures don't have the same nested structure | 11 | 992 | May 17, 2023 |