Unfortunately, fractional allocation per worker was unsuccessful. Here’s the error from Sentence-Transformers:
ray::RolloutWorker.__init__() (pid=24500, ip=172.20.43.82, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f9fc2fb6230>)
File "/home/aidanmcl/.local/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 592, in __init__
check_env(self.env)
File "/home/aidanmcl/.local/lib/python3.10/site-packages/ray/rllib/utils/pre_checks/env.py", line 88, in check_env
raise ValueError(
ValueError: Traceback (most recent call last):
File "/home/aidanmcl/.local/lib/python3.10/site-packages/ray/rllib/utils/pre_checks/env.py", line 75, in check_env
check_multiagent_environments(env)
File "/home/aidanmcl/.local/lib/python3.10/site-packages/ray/rllib/utils/pre_checks/env.py", line 299, in check_multiagent_environments
next_obs, reward, done, info = env.step(sampled_action)
File "/home/aidanmcl/Topology-alpha/env.py", line 86, in step
"crawler": self._crawler_observation(),
File "/home/aidanmcl/Topology-alpha/env.py", line 96, in _crawler_observation
obs, mask = self.crawler.outgoing_data()
File "/home/aidanmcl/Topology-alpha/Crawl/crawler.py", line 24, in outgoing_data
outgoing_data = self.outgoing_metadata(self.length)
File "/home/aidanmcl/Topology-alpha/Crawl/Subsystems/subsystems.py", line 30, in outgoing_metadata
return embed_array_st(self.scraper.get_metadata(), length)
File "/home/aidanmcl/Topology-alpha/Crawl/Subsystems/preprocessing.py", line 62, in embed_array_st
embeddings = TRANSFORMER_MODEL.encode(np.array(strings), show_progress_bar=False, device="cuda")
File "/home/aidanmcl/.local/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 153, in encode
self.to(device)
File "/home/aidanmcl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
return self._apply(convert)
File "/home/aidanmcl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/aidanmcl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/aidanmcl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/home/aidanmcl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
param_applied = fn(param)
File "/home/aidanmcl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/home/aidanmcl/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available
This is with an altered config file as such:
config = {
"env": Env, # Use the registered environment
"framework": "torch",
"num_gpus": 1,
"num_workers": 8,
"num_envs_per_worker": 1,
"num_gpu_per_worker": 0.0625,
"train_batch_size": 64,
"sgd_minibatch_size": 16,
...
}
No client-server setup here; all local. Let me know if I missed something. Thanks again for taking a look!