Embedding Preprocessing

Aidan_McLaughlin · March 28, 2023, 3:48pm

Priority: High

General question:
Is there a ‘correct way’ to embed textual info as an observation?

In my environment step, I run text collected by agents through Sentence Transformers, which works fine. However, the RLLIB environment seems to set CUDA visible devices globally to zero, forcing Sentence Transformers to use the CPU. Through this, I leave a ton of accelerated embedding on the table, substantially slowing episode time.

I allocated half a GPU to RLLIB via the PPO config, but similarly, there were still no available CUDA devices.

Here’s my config for reference:

config = {
    "env": Env,
    "framework": "torch",
    "num_gpus": 1,
    "num_workers": 10,
    "num_envs_per_worker": 1,
    "train_batch_size": 256,
    "sgd_minibatch_size": 64,
    "horizon": 1,
    "soft_horizon": True,
    "no_done_at_end": True,
    "gamma": 0.99,
    "lambda": 0.95,
    "clip_param": 0.2,
    "entropy_coeff": 0.01,
    "lr": 0.001,
    "multiagent": {
        "policies": {
            "ag1": PolicySpec(config={"model": {"fcnet_hiddens": [256, 256]}}),
            "ag2": PolicySpec(config={"model": {"fcnet_hiddens": [256, 256]}}),
        },
        "policy_mapping_fn": lambda agent_id, *args, **kwargs: agent_id,
        "policies_to_train": ["ag1"]
    }
}

Thanks for taking a look!

Best,
Aidan

Lars_Simon_Zehnder · March 30, 2023, 11:26am

Hi @Aidan_McLaughlin ,

you might need to assign fractions of the GPU to your workers where the embedding is happening (this is where the environment is running as long as you are not using a client-server setup).

"num_gpus": 1,
"num_workers": 2,
"num_gpus_per_worker": 0.25,

Lars_Simon_Zehnder · March 31, 2023, 2:29pm

@Aidan_McLaughlin, did this help you to use GPUs for rollouts?

Aidan_McLaughlin · March 31, 2023, 4:43pm

Hi @Lars_Simon_Zehnder,

Thank you so much for this suggestion! I’ve yet to implement it, but I will soon.
Best,
Aidan

Aidan_McLaughlin · April 3, 2023, 5:09pm

Unfortunately, fractional allocation per worker was unsuccessful. Here’s the error from Sentence-Transformers:

ray::RolloutWorker.__init__() (pid=24500, ip=172.20.43.82, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f9fc2fb6230>)
  File "/home/aidanmcl/.local/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 592, in __init__
    check_env(self.env)
  File "/home/aidanmcl/.local/lib/python3.10/site-packages/ray/rllib/utils/pre_checks/env.py", line 88, in check_env
    raise ValueError(
ValueError: Traceback (most recent call last):
  File "/home/aidanmcl/.local/lib/python3.10/site-packages/ray/rllib/utils/pre_checks/env.py", line 75, in check_env
    check_multiagent_environments(env)
  File "/home/aidanmcl/.local/lib/python3.10/site-packages/ray/rllib/utils/pre_checks/env.py", line 299, in check_multiagent_environments
    next_obs, reward, done, info = env.step(sampled_action)
  File "/home/aidanmcl/Topology-alpha/env.py", line 86, in step
    "crawler": self._crawler_observation(),
  File "/home/aidanmcl/Topology-alpha/env.py", line 96, in _crawler_observation
    obs, mask = self.crawler.outgoing_data()
  File "/home/aidanmcl/Topology-alpha/Crawl/crawler.py", line 24, in outgoing_data
    outgoing_data = self.outgoing_metadata(self.length)
  File "/home/aidanmcl/Topology-alpha/Crawl/Subsystems/subsystems.py", line 30, in outgoing_metadata
    return embed_array_st(self.scraper.get_metadata(), length)
  File "/home/aidanmcl/Topology-alpha/Crawl/Subsystems/preprocessing.py", line 62, in embed_array_st
    embeddings = TRANSFORMER_MODEL.encode(np.array(strings), show_progress_bar=False, device="cuda")
  File "/home/aidanmcl/.local/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 153, in encode
    self.to(device)
  File "/home/aidanmcl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/home/aidanmcl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/aidanmcl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/aidanmcl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/home/aidanmcl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/home/aidanmcl/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
  File "/home/aidanmcl/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available

This is with an altered config file as such:

config = {
    "env": Env,  # Use the registered environment
    "framework": "torch",
    "num_gpus": 1,
    "num_workers": 8,
    "num_envs_per_worker": 1,
    "num_gpu_per_worker": 0.0625,
    "train_batch_size": 64,
    "sgd_minibatch_size": 16,
    ...
}

No client-server setup here; all local. Let me know if I missed something. Thanks again for taking a look!

Lars_Simon_Zehnder · April 3, 2023, 5:32pm

@Aidan_McLaughlin , I am sorry to hear this. Fractional GPUs can be tricky. So let 's narrow this a little down.

Could you try the following:

ray.init(num_gpus=1)

...
config = {
    "env": Env,  # Use the registered environment
    "framework": "torch",
    "num_gpus": 0.5,
    "num_workers": 5,
    "num_envs_per_worker": 1,
    "num_gpu_per_worker": 0.1,
    "train_batch_size": 64,
    "sgd_minibatch_size": 16,
    ...
}

The reason behind it is that num_gpus is the fraction of the GPU dedicated to the driver process (and the local worker running there). So this is used for training. The num_gpu_per_worker then should only use what is remaining of the GPU, so here 0.5.

This becomes even more tricky, if you are running Tune experiments and define there your resources.

Let me know, if this helps now. Sorry for my misunderstanding above, I thought you got a couple of GPUs.

Aidan_McLaughlin · April 3, 2023, 5:47pm

I think this worked! Still ironing out some vRAM bumps, but the preprocessors recognize a CUDA device. Thanks a ton for the help!

Lars_Simon_Zehnder · April 4, 2023, 8:12am

@Aidan_McLaughlin, great! Glad I could help.

Topic		Replies	Views
GPU Acceleration 0.0/1.0 RLlib	2	82	March 13, 2025
RLlib in conjuncton with GPU env RLlib	2	394	March 29, 2023
RLlib slows down when gpu available but not used RLlib	0	354	April 7, 2021
Training and inference ONLY using GPUs and no CPUs RLlib	7	1850	April 12, 2021
[RLlib] GPU selection RLlib	4	1253	April 30, 2021

Embedding Preprocessing

Related topics