How does `serve` create replica and allocate resources when doing composition?

Here’s the official example code demonstrating deployment composition

# File name: hello.py
from ray import serve
from ray.serve.handle import DeploymentHandle


@serve.deployment
class LanguageClassifer:
    def __init__(
        self, spanish_responder: DeploymentHandle, french_responder: DeploymentHandle
    ):
        self.spanish_responder = spanish_responder
        self.french_responder = french_responder

    async def __call__(self, http_request):
        request = await http_request.json()
        language, name = request["language"], request["name"]

        if language == "spanish":
            response = self.spanish_responder.say_hello.remote(name)
        elif language == "french":
            response = self.french_responder.say_hello.remote(name)
        else:
            return "Please try again."

        return await response


@serve.deployment
class SpanishResponder:
    def say_hello(self, name: str):
        return f"Hola {name}"


@serve.deployment
class FrenchResponder:
    def say_hello(self, name: str):
        return f"Bonjour {name}"


spanish_responder = SpanishResponder.bind()
french_responder = FrenchResponder.bind()
language_classifier = LanguageClassifer.bind(spanish_responder, french_responder)

Do num_replica and ray_actor_options of a deployment affect the number of replica and resources of those deployments being referred to?
In other words, does the setting specified in serve.deployment affect only the decorated deployment, or it also recursively applies to the deployments it uses?

Hi @mk6, thanks for posting to the community. The configurable parameters in @serve.deployment do not apply recursively to the child deployments that are referenced. This allows each deployment to scale up/down independently.