Class methods in tune.run

MakGulati · February 17, 2021, 10:35am

I have created a class member function that I passing in tune.run but it is not working.
Please keras fit mnist example gist_mnist

File "/Users/mayank/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 50, in __getattr__
    module = self._load()
  File "/Users/mayank/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 44, in _load
    module = _importlib.import_module(self.__name__)
  File "/Users/mayank/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 50, in __getattr__
    module = self._load()
  File "/Users/mayank/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 44, in _load
    module = _importlib.import_module(self.__name__)
  File "/Users/mayank/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 50, in __getattr__
    module = self._load()
  File "/Users/mayank/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 44, in _load
    module = _importlib.import_module(self.__name__)
  File "/Users/mayank/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 50, in __getattr__
    module = self._load()
  File "/Users/mayank/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 44, in _load
    module = _importlib.import_module(self.__name__)
  File "/Users/mayank/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 50, in __getattr__
    module = self._load()
RecursionError: maximum recursion depth exceeded while calling a Python object

rliaw · February 17, 2021, 6:21pm

Can you avoid creating a class member function? Tensorflow does not serialize well.

MakGulati · February 17, 2021, 6:55pm

Thanks for the response. In this toy example it is easy to do so. But in the code I working on I need to use class methods to train. Is there any work around for this?

rliaw · February 17, 2021, 7:29pm

Actually, I ran it and it gave me this error:

(pid=55616) 2021-02-17 11:28:56,264	ERROR function_runner.py:254 -- Runner Thread raised error.
(pid=55616) Traceback (most recent call last):
(pid=55616)   File "/Users/rliaw/ray/python/ray/tune/function_runner.py", line 248, in run
(pid=55616)     self._entrypoint()
(pid=55616)   File "/Users/rliaw/ray/python/ray/tune/function_runner.py", line 316, in entrypoint
(pid=55616)     self._status_reporter.get_checkpoint())
(pid=55616)   File "/Users/rliaw/ray/python/ray/tune/function_runner.py", line 576, in _trainable_func
(pid=55616)     output = fn()
(pid=55616)   File "test.py", line 52, in train_mnist
(pid=55616)     callbacks=[TuneReportCallback({"mean_accuracy": "acc"})],
(pid=55616)   File "/Users/rliaw/miniconda3/envs/raybuild/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1145, in fit
(pid=55616)     callbacks.on_epoch_end(epoch, epoch_logs)
(pid=55616)   File "/Users/rliaw/miniconda3/envs/raybuild/lib/python3.7/site-packages/tensorflow/python/keras/callbacks.py", line 432, in on_epoch_end
(pid=55616)     callback.on_epoch_end(epoch, numpy_logs)
(pid=55616)   File "/Users/rliaw/ray/python/ray/tune/integration/keras.py", line 59, in on_epoch_end
(pid=55616)     self._handle(logs, "epoch_end")
(pid=55616)   File "/Users/rliaw/ray/python/ray/tune/integration/keras.py", line 163, in _handle
(pid=55616)     report_dict[key] = logs[metric]
(pid=55616) KeyError: 'acc'

This seems to be different from what you’re seeing?

MakGulati · February 17, 2021, 8:01pm

Ok this one is about callbacks=[TuneReportCallback({"mean_accuracy": "acc"})], if you change "acc" to "accuracy" that should go further.

This one with normal function works fine. But as I mentioned before, I need to use class method for my use case.

Thanks a bunch.

rliaw · February 17, 2021, 8:04pm

@MakGulati I ran your script that uses the class API that you posted in the first original post with the adjusted TuneReportCallback, and:

== Status ==
Memory usage on this node: 21.3/64.0 GiB
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 320.000: None | Iter 80.000: None | Iter 20.000: None
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/29.35 GiB heap, 0.0/10.11 GiB objects
Current best trial: 28e03_00000 with mean_accuracy=0.7910000085830688 and parameters={'threads': 2, 'lr': 0.05489876727916951, 'momentum': 0.8665391833535001, 'hidden': 291}
Result logdir: /Users/rliaw/ray_results/exp
Number of trials: 10/10 (10 TERMINATED)
+-------------------------+------------+-------+----------+-----------+------------+----------+--------+------------------+
| Trial name              | status     | loc   |   hidden |        lr |   momentum |      acc |   iter |   total time (s) |
|-------------------------+------------+-------+----------+-----------+------------+----------+--------+------------------|
| train_mnist_28e03_00000 | TERMINATED |       |      291 | 0.0548988 |   0.866539 | 0.791    |      5 |         30.7941  |
| train_mnist_28e03_00001 | TERMINATED |       |       92 | 0.0741415 |   0.506089 | 0.790433 |      5 |         26.6802  |
| train_mnist_28e03_00002 | TERMINATED |       |       56 | 0.0539287 |   0.724858 | 0.772333 |      5 |         22.7859  |
| train_mnist_28e03_00003 | TERMINATED |       |      117 | 0.0824197 |   0.360737 | 0.753    |      5 |         18.4978  |
| train_mnist_28e03_00004 | TERMINATED |       |      308 | 0.0663077 |   0.238346 | 0.7661   |      5 |         14.6738  |
| train_mnist_28e03_00005 | TERMINATED |       |      481 | 0.0284464 |   0.168725 | 0.751317 |      5 |         11.0297  |
| train_mnist_28e03_00006 | TERMINATED |       |      236 | 0.0169419 |   0.305452 | 0.735817 |      5 |          7.41396 |
| train_mnist_28e03_00007 | TERMINATED |       |      394 | 0.0563694 |   0.750468 | 0.774083 |      5 |          7.68584 |
| train_mnist_28e03_00008 | TERMINATED |       |      449 | 0.0461859 |   0.509806 | 0.768    |      5 |          3.59093 |
| train_mnist_28e03_00009 | TERMINATED |       |      160 | 0.0830146 |   0.829147 | 0.731833 |      5 |          2.67347 |
+-------------------------+------------+-------+----------+-----------+------------+----------+--------+------------------+


2021-02-17 12:04:03,032	INFO tune.py:545 -- Total run time: 54.51 seconds (52.28 seconds for the tuning loop).
Best hyperparameters found were:  {'threads': 2, 'lr': 0.05489876727916951, 'momentum': 0.8665391833535001, 'hidden': 291}
(pid=55944) 469/469 - 0s - loss: 0.8140 - accuracy: 0.7318 - val_loss: 0.4749 - val_accuracy: 0.8723

What version of TF are you on?

MakGulati · February 17, 2021, 8:06pm

tf version is 1.15.
So does not it support tf 1?

rliaw · February 17, 2021, 8:08pm

It looks like in TF1, it’ll be hard. Maybe try importing tensorflow inside the class method?

MakGulati · February 17, 2021, 8:21pm

It works. Awesome. Can’t thank you more. You are amazing

Topic		Replies	Views
Actor initialization problem Ray Core	1	532	January 15, 2022
Tune.run not executing actual trials Ray Tune	2	456	January 3, 2022
Running tune with HF Transformers On Ray Project Image Ray Tune	3	443	December 2, 2020
"ModuleNotFoundError: No module named in" when connecting in client mode Ray Tune	3	2323	November 15, 2021
Making Custom Python Modules Available in RayTune Workers	2	110	June 21, 2024

Class methods in tune.run

Related topics