Cannot create R2D2 trainer with evaluation worker

fedetask · March 6, 2022, 6:43pm

I have an R2D2 checkpoint stored in checkpoint_path. What I do is:

config = { ... my configuration ... }
config['evaluation_num_workers'] = 1
trainer = tune.registry.get_trainable_cls('R2D2')(config=config)

and I receive the error:

ValueError: replay_sequence_length is calculated automatically to be model->max_seq_len + burn_in!

however, if I do print(config['replay_sequence_length']) the output is -1, correctly since R2D2 requires it to be -1 and will automatically compute it.

If I do not set config[evaluation_num_workers] = 1 then everything works fine, but for other reasons that I won’t explain here I need it to be 1.

Lars_Simon_Zehnder · March 6, 2022, 8:09pm

@fedetask ,

I would check what config["evaluation_config"] looks like as during evaluation this config is used. Maybe therein you get a default of 1.

fedetask · March 6, 2022, 8:54pm

That was the problem, thanks!

fedetask · March 8, 2022, 9:58am

However, isn’t this a bug? My evaluation config is just {'explore': False}, it does not contain the 'replay_sequence_length' key at all.

Lars_Simon_Zehnder · March 8, 2022, 10:10am

The custom exploration_config is merged with the default (or common) exploration configuration during validation such that a user does not have to restate all configuration parameters. As a side-effect you have some default values that are taken from the default configuration that are not directly observable and have to be accounted for.

As configuration parameters are a lot in RL the RLlib team opted for using a default configuration and let users change specific parameters. So we just have to learn once, what the default is but do not have to set up a full configuration (and certainly often forget some parameters that lead then to an error).

fedetask · March 8, 2022, 10:28am

Ok I understand, this does indeed cause an issue since the correct 'replay_sequence_length' in the default R2D2 is probably replaced with a wrong value when the merge happens. I guess it’s this bug that hasn’t been addressed yet.

Do you think that if I set config['evaluation_config']['replay_sequence_length'] = -1 this would cause an issue to other algorithms? DQN works anyway but I don’t know about others.

Lars_Simon_Zehnder · March 8, 2022, 1:45pm

I guess that as long as you do not use a recurrent model that would work fine.

Topic		Replies	Views
DQNTrainer evaluate() doesn't perform any episode RLlib	1	500	March 16, 2022
[BUG] Heavy logic problem in validate_config for R2D2 RLlib	0	306	December 12, 2021
[RLlib] Questions about loading checkpoint and asynchrone evaluation workers RLlib	3	592	May 26, 2021
Trainer.evaluate() runs 1 extra episode instead of as defined in evaluation_duration RLlib	1	364	August 26, 2022
How to accelerate evaluations with more evaluation workers RLlib	1	283	April 15, 2022

Cannot create R2D2 trainer with evaluation worker

Related topics