TypeError: 'NoneType' object is not callable error from ray data `map_batches`

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I have a simple dataset featurization task as below:

import deepchem as dc
import os
import ray

from typing import Dict, Any, Iterable, Optional, List
from functools import partial


ray.init(num_cpus=4)

def featurize(row: Dict[str, Any],
              featurizer,
              x='smiles',
              y='logp') -> Dict[str, Any]:
    return row


featurize_batches = partial(featurize, featurizer=dc.feat.CircularFingerprint())

# Featurizing a dataset
ds = ray.data.read_csv('zinc1k.csv').map_batches(featurize_batches, num_cpus=4)

# Writing a dataset to disk
ds.write_parquet('data-dir')

for i, batch in enumerate(ds.iter_batches()):
    print (i)

ray.shutdown()

The above featurizes the task but also throws up the following exception message:

Exception ignored in: <function Dataset.__del__ at 0x2aad7d3a0>
Traceback (most recent call last):
  File "/Users/arun/Applications/miniconda3/envs/dc/lib/python3.9/site-packages/ray/data/dataset.py", line 5148, in __del__
TypeError: 'NoneType' object is not callable

I am also not able to catch this exception.

Other details:
Python version: 3.9.17
Ray version: 2.9.0
OS: macOS 13.5 (M1 Chip)

Hey @arunppsg, does your program otherwise run correctly? Also, could you share the full traceback?

Here is the full traceback

2024-01-12 10:54:18,930	INFO worker.py:1715 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
2024-01-12 10:54:20,226	INFO set_read_parallelism.py:115 -- Using autodetected parallelism=8 for stage ReadCSV to satisfy parallelism at least twice the available number of CPUs (4).
2024-01-12 10:54:20,226	INFO set_read_parallelism.py:122 -- To satisfy the requested parallelism of 8, each read task output is split into 8 smaller blocks.
2024-01-12 10:54:20,227	INFO streaming_executor.py:112 -- Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[ReadCSV] -> TaskPoolMapOperator[MapBatches(partial)] -> TaskPoolMapOperator[Write]
2024-01-12 10:54:20,227	INFO streaming_executor.py:113 -- Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=None), exclude_resources=ExecutionResources(cpu=0, gpu=0, object_store_memory=0), locality_with_output=False, preserve_order=False, actor_locality_enabled=True, verbose_progress=False)
2024-01-12 10:54:20,228	INFO streaming_executor.py:115 -- Tip: For detailed progress reporting, run `ray.data.DataContext.get_current().execution_options.verbose_progress = True`
(MapBatches(partial) pid=1470) No normalization for AvgIpc. Feature removed!                                           
(MapBatches(partial) pid=1470) WARNING:tensorflow:From /Users/arun/Applications/miniconda3/envs/dc/lib/python3.9/site-packages/tensorflow/python/util/deprecation.py:576: calling function (from tensorflow.python.eager.polymorphic_function.polymorphic_function) with experimental_relax_shapes is deprecated and will be removed in a future version.
(MapBatches(partial) pid=1470) Instructions for updating:                                                              
(MapBatches(partial) pid=1470) experimental_relax_shapes is deprecated, use reduce_retracing instead                   
(MapBatches(partial) pid=1470) Skipped loading some Jax models, missing a dependency. No module named 'jax'            
2024-01-12 10:54:26,230 WARNING plan.py:588 -- Warning: The Ray cluster currently does not have any available CPUs. The Dataset job will hang unless more CPUs are freed up. A common reason is that cluster resources are used by Actors or Tune trials; see the following link for more details: https://docs.ray.io/en/latest/data/data-internals.html#ray-data-and-tune
2024-01-12 10:54:26,237	INFO set_read_parallelism.py:115 -- Using autodetected parallelism=8 for stage ReadCSV to satisfy parallelism at least twice the available number of CPUs (4).
2024-01-12 10:54:26,237	INFO set_read_parallelism.py:122 -- To satisfy the requested parallelism of 8, each read task output is split into 8 smaller blocks.
2024-01-12 10:54:26,237	INFO streaming_executor.py:112 -- Executing DAG InputDataBuffer[Input] -> TaskPoolMapOperator[ReadCSV] -> TaskPoolMapOperator[MapBatches(partial)]
2024-01-12 10:54:26,237	INFO streaming_executor.py:113 -- Execution config: ExecutionOptions(resource_limits=ExecutionResources(cpu=None, gpu=None, object_store_memory=None), exclude_resources=ExecutionResources(cpu=0, gpu=0, object_store_memory=0), locality_with_output=False, preserve_order=False, actor_locality_enabled=True, verbose_progress=False)
2024-01-12 10:54:26,237	INFO streaming_executor.py:115 -- Tip: For detailed progress reporting, run `ray.data.DataContext.get_current().execution_options.verbose_progress = True`
Running: 0.0/4.0 CPU, 0.0/0.0 GPU, 0.04 MiB/387.23 MiB object_store_memory:  62%|████▍  | 5/8 [00:00<00:00, 104.08it/s]/Users/arun/Applications/miniconda3/envs/dc/lib/python3.9/site-packages/ray/data/_internal/arrow_block.py:148: FutureWarning: promote has been superseded by mode='default'.
  return transform_pyarrow.concat(tables)
0                                                                                                                      
1                                                                                                                      
2                                                                                                                      
3
Exception ignored in: <function Dataset.__del__ at 0x2ae724af0>
Traceback (most recent call last):
  File "/Users/arun/Applications/miniconda3/envs/dc/lib/python3.9/site-packages/ray/data/dataset.py", line 5148, in __del__
TypeError: 'NoneType' object is not callable

I think the error is specific to macOS because I was not able to reproduce the error in a linux machine.

Ah, yeah. I think I’ve seen before.

IIRC this error is harmless, although it might be confusing.

@arunppsg would you mind opening a GitHub Issue? We’ll try to fix it.

Opened an issue here: [data] TypeError: ‘NoneType’ object is not callable error from ray data `map_batches` · Issue #42382 · ray-project/ray · GitHub

1 Like