I have an R2D2 checkpoint stored in checkpoint_path
. What I do is:
config = { ... my configuration ... }
config['evaluation_num_workers'] = 1
trainer = tune.registry.get_trainable_cls('R2D2')(config=config)
and I receive the error:
ValueError: replay_sequence_length
is calculated automatically to be model->max_seq_len + burn_in!
however, if I do print(config['replay_sequence_length'])
the output is -1
, correctly since R2D2 requires it to be -1
and will automatically compute it.
If I do not set config[evaluation_num_workers] = 1
then everything works fine, but for other reasons that I won’t explain here I need it to be 1
.
@fedetask ,
I would check what config["evaluation_config"]
looks like as during evaluation this config is used. Maybe therein you get a default of 1
.
1 Like
That was the problem, thanks!
However, isn’t this a bug? My evaluation config is just {'explore': False}
, it does not contain the 'replay_sequence_length'
key at all.
The custom exploration_config
is merged with the default (or common) exploration configuration during validation such that a user does not have to restate all configuration parameters. As a side-effect you have some default values that are taken from the default configuration that are not directly observable and have to be accounted for.
As configuration parameters are a lot in RL the RLlib team opted for using a default configuration and let users change specific parameters. So we just have to learn once, what the default is but do not have to set up a full configuration (and certainly often forget some parameters that lead then to an error).
1 Like
Ok I understand, this does indeed cause an issue since the correct 'replay_sequence_length'
in the default R2D2 is probably replaced with a wrong value when the merge happens. I guess it’s this bug that hasn’t been addressed yet.
Do you think that if I set config['evaluation_config']['replay_sequence_length'] = -1
this would cause an issue to other algorithms? DQN works anyway but I don’t know about others.
1 Like
I guess that as long as you do not use a recurrent model that would work fine.
1 Like