Right now, there are the use_attention and use_lstm config flags that append an LSTM or GTrXL to the output of the MLP. I’ve just landed a commit to enable the use of a differentiable neural computer as a model, and I’d like to add my own memory model soon.
I was wondering if it was worth discussing a more general way to append memory modules after the MLP, as use_lstm, use_attention, use_dnc, use_sgm seems to pollute the config. Would it be possible to have a general config like:
{
...
# Can be None, lstm, attention, dnc, sgm
"memory_module": "lstm",
"memory_module_config": {
"lstm_cell_size": 64
}
Yeah, I think this would be great, @smorad ! The only thing is that then memory_module_config would have to be flexible wrt the allowed keys, but I think that’s ok. The user would have to know, which args these different wrappers support.
Btw, thanks so much for the DNC PR, which is really awesome! Would you mind putting the above suggestion in a PR?
My suggestion would be that you include an “autowrap” or some similar key because right now “use_lstm/attention” means autowrap an lstm/attention network. I would want to be able to use this dictionary even if I have a custom memory module implementation.
Edit*
I guess in your proposal memory_module==None would mean don’t autowrap? And if it was None then the custom model would still be able to use the memory_module_config?