Beam search for RL?

Astariul · June 16, 2021, 1:40am

Does beam search makes sense in the context of RL ?

At every step of the inference time, we would take several actions (assuming the environment is inexpensive), follow k trajectories, and eventually keep the trajectory with the best reward ?

I hacked a beam search algorithm for my env in RLLib, reordering each trajectory based on the action’s logprobs (intermediate reward is 0 is my problem), but the score is lower than just using basic sampling.

Does beam search makes sense in the context of RL ?

Topic		Replies	Views
Ray Tune: Reinforcement Learning-based search algorithm	0	197	November 29, 2023
TrajectoryTracking with RLLIB RLlib	14	1363	November 17, 2021
Replacing Rewards with Examples RLlib	0	253	July 9, 2021
My RLlib implementation seems to compute random actions RLlib	4	942	February 15, 2022
Bad inference after perfect training. What am I missing? RLlib	3	775	June 8, 2022

Beam search for RL?

Related topics