Cannot create R2D2 trainer with evaluation worker

I have an R2D2 checkpoint stored in checkpoint_path. What I do is:

config = { ... my configuration ... }
config['evaluation_num_workers'] = 1
trainer = tune.registry.get_trainable_cls('R2D2')(config=config)

and I receive the error:

ValueError: replay_sequence_length is calculated automatically to be model->max_seq_len + burn_in!

however, if I do print(config['replay_sequence_length']) the output is -1, correctly since R2D2 requires it to be -1 and will automatically compute it.

If I do not set config[evaluation_num_workers] = 1 then everything works fine, but for other reasons that I won’t explain here I need it to be 1.

@fedetask ,

I would check what config["evaluation_config"] looks like as during evaluation this config is used. Maybe therein you get a default of 1.

1 Like

That was the problem, thanks!

However, isn’t this a bug? My evaluation config is just {'explore': False}, it does not contain the 'replay_sequence_length' key at all.

The custom exploration_config is merged with the default (or common) exploration configuration during validation such that a user does not have to restate all configuration parameters. As a side-effect you have some default values that are taken from the default configuration that are not directly observable and have to be accounted for.

As configuration parameters are a lot in RL the RLlib team opted for using a default configuration and let users change specific parameters. So we just have to learn once, what the default is but do not have to set up a full configuration (and certainly often forget some parameters that lead then to an error).

1 Like

Ok I understand, this does indeed cause an issue since the correct 'replay_sequence_length' in the default R2D2 is probably replaced with a wrong value when the merge happens. I guess it’s this bug that hasn’t been addressed yet.

Do you think that if I set config['evaluation_config']['replay_sequence_length'] = -1 this would cause an issue to other algorithms? DQN works anyway but I don’t know about others.

1 Like

I guess that as long as you do not use a recurrent model that would work fine.

1 Like