How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hi, thanks for the excellent software.
I use Ray 2.0 and has been following the example in https://docs.ray.io/en/master/serve/end_to_end_tutorial.html
I got it to work on CPU but when I use the @ray.remote decorator, it throws an error:
TypeError: Remote functions cannot be called directly. Instead of running ‘main.init()’, try ‘main.init.remote()’.
My full script, along with the shell script for executing on Ubuntu 21 is here:
ray stop
ray start --head --port=8016 --num-gpus=2
python3 torchserverv2.py
import ray
from ray import serve
#from fastapi import FastAPI
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer#app = FastAPI()
ray.init(address=“auto”, namespace=“serve”)
serve.start()@serve.deployment
class M2M:
@ray.remote(num_gpus=2)
def init(self):
self.model = M2M100ForConditionalGeneration.from_pretrained(“facebook/m2m100_418M”)
self.tokenizer = M2M100Tokenizer.from_pretrained(“facebook/m2m100_418M”)
self.src_lang = ‘zh’
self.trg_lang = ‘en’@ray.remote(num_gpus=2) def __call__(self, request): self.tokenizer.src_lang = self.src_lang txt = request.query_params['txt'] encoded_text = self.tokenizer(txt, return_tensors="pt", padding=True, truncation = True) generated_tokens = self.model.generate(**encoded_text,forced_bos_token_id=self.tokenizer.get_lang_id(self.trg_lang)) return self.tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
M2M.deploy()
Question 2: My next step is to get ray serve to do batching, so that it only executes M2M.call every 12 requests (or mini-batch = 12), or when 0.1 second has passed since the last request was made
I checked the tutorials and they are based on v1.12 (not the v2 I am trying to use). Any help on implementing this? Many thanks
RT