How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I am working on computing constraint for a node in a graph using NetworkX library.
Following is the key part of my code
ray.init()
data = pd.read_csv('./connection.csv')
data.columns = ['src', 'dst', 'weight']
# build graph, which contains about 84000 nodes
G = nx.Graph()
for _, row in data.iterrows():
G.add_edge(row['src'], row['dst'], weight=row['weight'])
g = ray.put(G)
@ray.remote
def get_constraint(v):
G = ray.get(g)
print(v)
if len(G[v]) == 0:
return {v: float("nan")}
return {v: sum(
local_constraint(G, v, n, 'weight') for n in set(nx.all_neighbors(G, v))
)}
idx = 0
while idx * 9000 < len(G):
futures = [get_constraint.remote(v) for v in list(G.nodes())[idx * 9000:(idx + 1) * 9000]]
constraint = ray.get(futures)
with open(f'constraint_{idx}.pkl', 'wb') as f:
pickle.dump(constraint, f)
idx += 1
Firstly I tried
futures = [get_constraint.remote(v) for v in G]
constraint = ray.get(futures)
Then after finishing 10000 tasks the program will get stuck, and I check “ray summary tasks” I found following result
Then I split the total tasks into groups, each group contains 9000 tasks
idx = 0
while idx * 9000 < len(G):
futures = [get_constraint.remote(v) for v in list(G.nodes())[idx * 9000:(idx + 1) * 9000]]
constraint = ray.get(futures)
with open(f'constraint_{idx}.pkl', 'wb') as f:
pickle.dump(constraint, f)
idx += 1
After two loops (I can get constraint_0 and constraint_1 properly) the program get stuck with the same result from ray summary tasks
I check gcs_server_log.err and raylet.err, no error recorded.
Also I check ps aux and found there are lots of ray workers IDLE, not sure why not schedule to those idle workers.
Any help here? Thanks in advance