Class methods in tune.run

I have created a class member function that I passing in tune.run but it is not working.
Please keras fit mnist example gist_mnist

File "/Users/mayank/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 50, in __getattr__
    module = self._load()
  File "/Users/mayank/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 44, in _load
    module = _importlib.import_module(self.__name__)
  File "/Users/mayank/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 50, in __getattr__
    module = self._load()
  File "/Users/mayank/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 44, in _load
    module = _importlib.import_module(self.__name__)
  File "/Users/mayank/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 50, in __getattr__
    module = self._load()
  File "/Users/mayank/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 44, in _load
    module = _importlib.import_module(self.__name__)
  File "/Users/mayank/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 50, in __getattr__
    module = self._load()
  File "/Users/mayank/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 44, in _load
    module = _importlib.import_module(self.__name__)
  File "/Users/mayank/.local/lib/python3.6/site-packages/tensorflow/__init__.py", line 50, in __getattr__
    module = self._load()
RecursionError: maximum recursion depth exceeded while calling a Python object

Can you avoid creating a class member function? Tensorflow does not serialize well.

Thanks for the response. In this toy example it is easy to do so. But in the code I working on I need to use class methods to train. Is there any work around for this?

Actually, I ran it and it gave me this error:

(pid=55616) 2021-02-17 11:28:56,264	ERROR function_runner.py:254 -- Runner Thread raised error.
(pid=55616) Traceback (most recent call last):
(pid=55616)   File "/Users/rliaw/ray/python/ray/tune/function_runner.py", line 248, in run
(pid=55616)     self._entrypoint()
(pid=55616)   File "/Users/rliaw/ray/python/ray/tune/function_runner.py", line 316, in entrypoint
(pid=55616)     self._status_reporter.get_checkpoint())
(pid=55616)   File "/Users/rliaw/ray/python/ray/tune/function_runner.py", line 576, in _trainable_func
(pid=55616)     output = fn()
(pid=55616)   File "test.py", line 52, in train_mnist
(pid=55616)     callbacks=[TuneReportCallback({"mean_accuracy": "acc"})],
(pid=55616)   File "/Users/rliaw/miniconda3/envs/raybuild/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1145, in fit
(pid=55616)     callbacks.on_epoch_end(epoch, epoch_logs)
(pid=55616)   File "/Users/rliaw/miniconda3/envs/raybuild/lib/python3.7/site-packages/tensorflow/python/keras/callbacks.py", line 432, in on_epoch_end
(pid=55616)     callback.on_epoch_end(epoch, numpy_logs)
(pid=55616)   File "/Users/rliaw/ray/python/ray/tune/integration/keras.py", line 59, in on_epoch_end
(pid=55616)     self._handle(logs, "epoch_end")
(pid=55616)   File "/Users/rliaw/ray/python/ray/tune/integration/keras.py", line 163, in _handle
(pid=55616)     report_dict[key] = logs[metric]
(pid=55616) KeyError: 'acc'

This seems to be different from what you’re seeing?

Ok this one is about callbacks=[TuneReportCallback({"mean_accuracy": "acc"})], if you change "acc" to "accuracy" that should go further.

This one with normal function works fine. But as I mentioned before, I need to use class method for my use case.

Thanks a bunch.

@MakGulati I ran your script that uses the class API that you posted in the first original post with the adjusted TuneReportCallback, and:

== Status ==
Memory usage on this node: 21.3/64.0 GiB
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 320.000: None | Iter 80.000: None | Iter 20.000: None
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/29.35 GiB heap, 0.0/10.11 GiB objects
Current best trial: 28e03_00000 with mean_accuracy=0.7910000085830688 and parameters={'threads': 2, 'lr': 0.05489876727916951, 'momentum': 0.8665391833535001, 'hidden': 291}
Result logdir: /Users/rliaw/ray_results/exp
Number of trials: 10/10 (10 TERMINATED)
+-------------------------+------------+-------+----------+-----------+------------+----------+--------+------------------+
| Trial name              | status     | loc   |   hidden |        lr |   momentum |      acc |   iter |   total time (s) |
|-------------------------+------------+-------+----------+-----------+------------+----------+--------+------------------|
| train_mnist_28e03_00000 | TERMINATED |       |      291 | 0.0548988 |   0.866539 | 0.791    |      5 |         30.7941  |
| train_mnist_28e03_00001 | TERMINATED |       |       92 | 0.0741415 |   0.506089 | 0.790433 |      5 |         26.6802  |
| train_mnist_28e03_00002 | TERMINATED |       |       56 | 0.0539287 |   0.724858 | 0.772333 |      5 |         22.7859  |
| train_mnist_28e03_00003 | TERMINATED |       |      117 | 0.0824197 |   0.360737 | 0.753    |      5 |         18.4978  |
| train_mnist_28e03_00004 | TERMINATED |       |      308 | 0.0663077 |   0.238346 | 0.7661   |      5 |         14.6738  |
| train_mnist_28e03_00005 | TERMINATED |       |      481 | 0.0284464 |   0.168725 | 0.751317 |      5 |         11.0297  |
| train_mnist_28e03_00006 | TERMINATED |       |      236 | 0.0169419 |   0.305452 | 0.735817 |      5 |          7.41396 |
| train_mnist_28e03_00007 | TERMINATED |       |      394 | 0.0563694 |   0.750468 | 0.774083 |      5 |          7.68584 |
| train_mnist_28e03_00008 | TERMINATED |       |      449 | 0.0461859 |   0.509806 | 0.768    |      5 |          3.59093 |
| train_mnist_28e03_00009 | TERMINATED |       |      160 | 0.0830146 |   0.829147 | 0.731833 |      5 |          2.67347 |
+-------------------------+------------+-------+----------+-----------+------------+----------+--------+------------------+


2021-02-17 12:04:03,032	INFO tune.py:545 -- Total run time: 54.51 seconds (52.28 seconds for the tuning loop).
Best hyperparameters found were:  {'threads': 2, 'lr': 0.05489876727916951, 'momentum': 0.8665391833535001, 'hidden': 291}
(pid=55944) 469/469 - 0s - loss: 0.8140 - accuracy: 0.7318 - val_loss: 0.4749 - val_accuracy: 0.8723

What version of TF are you on?

tf version is 1.15.
So does not it support tf 1?

It looks like in TF1, it’ll be hard. Maybe try importing tensorflow inside the class method?

1 Like

It works. Awesome. Can’t thank you more. You are amazing :slight_smile: