Filter feature for Loggers

Hi
thanks for developing ray!
I use ray tune with ray cluster for hyperparameter (HP) search.
HP space can be large and creating a directory and related files
per trial can be heavy on the filesystem.
So it would be useful to have the option to save only the top
N trials according to some metrics/criterion to be specified
through the Logger API.
AFAIK such feature is currently not available.
What’s the best way to implement it ? LoggerCallback?
Also as it may be of general interest would you consider it as a feature request?
thanks!

hey @fminiati sorry for the slow reply!
mm, I think you could probably implement this as a custom logger?

Though keep in mind this might break the Analysis object, which relies on the output of the CSVLogger I think.

How big is your experiment, if you don’t mind me asking!

hi rilaw

no worries, thanks for your reply.
that’s would be my guess too… though it’s non trivial I agree.
Presumably one would have to create and remove directories associated to trials according to their score as the parameter search proceeds while the two json summary files, basic-variant-state*** and experiment-state-***, could be update in core and then dumped to file at the end.
But it’s easier said than done.

My experiment at the moment includes some 30000 combinations, so still manageable… although writing checkpoint files can quickly fill up disk space up to hundreds of GiB.

By the way, I tried to pass a list of Loggers to “callbacks” in ray.tune, for example two of the default three (TBXLoggerCallback, JsonLoggerCallback, CSVLogger) but I didn’t notice any difference.