SocketIO support

1. Severity of the issue: (select one)
[x ] Medium: Significantly affects my productivity but can find a workaround.

2. Environment:

  • Ray version: 2.46.0
  • Python version: 3.11
  • OS: Linux
  • Cloud/Infrastructure: On Prem Kubernetes
  • Other libs/tools (if relevant): InvokeAI, SocketIO

3. What happened vs. what you expected:

  • Expected: Trying to serve InvokeAI Community Edition using the Serve FastAPI integration but it seems that Ray is unable to handle the SocketIO communication
  • Actual: The SocketIO requests are not handled and therefore the web ui doesn’t work properly

4. Steps to reproduce:

  1. Load the official InvokeAI docker image from ghcr.io/invoke-ai/invokeai:latest
  2. pip install -U “ray[serve]”
  3. Run the script below with python
  4. Check the WebUI SocketIO communication. For example once a model is installed and a inference request is send the queue should update in the web browser and there should be an updating image preview but that does not work

By the way, are there any plans to add some kind of user affinity option to ray serve ?

I’m happy about any help!

import ray
from ray import serve

try:
    from invokeai.app.api_app import app as fastapi_app
    from invokeai.app.services.config import InvokeAIAppConfig
    from invokeai.app.services.config.config_default import get_config
    from invokeai.frontend.cli.arg_parser import InvokeAIArgs
    from invokeai.backend.util.logging import InvokeAILogger
    from invokeai.app.util.torch_cuda_allocator import configure_torch_cuda_allocator

    # Parse CLI args, this is important for InvokeAI to pick up configurations
    # from invokeai.yaml or environment variables.
    # We provide an empty list for sys_argv to avoid conflicts if this script
    # itself is run with arguments not meant for InvokeAI.
    InvokeAIArgs.parse_args()

    # To mimic `invokeai-web --root .`, we explicitly set the root to the current directory.
    # This ensures that InvokeAI looks for `invokeai.yaml`, models, etc., relative to
    # where this Ray Serve script is launched.
    if InvokeAIArgs.args:  # This should be a Namespace object after parse_args()
        InvokeAIArgs.args.root = "."

    app_config = get_config()

    logger = InvokeAILogger.get_logger(config=app_config)

    # Configure the torch CUDA memory allocator.
    # NOTE: It is important that this happens before torch is imported by other parts of InvokeAI.
    if app_config.pytorch_cuda_alloc_conf:
        configure_torch_cuda_allocator(app_config.pytorch_cuda_alloc_conf, logger)

    # Import other necessary modules after potential CUDA allocator configuration
    from invokeai.app.invocations.load_custom_nodes import load_custom_nodes
    from invokeai.app.util.startup_utils import (
        apply_monkeypatches,
        check_cudnn,
        register_mime_types,
    )

    # Perform essential startup tasks that run_app.py normally handles,
    # excluding Uvicorn server setup and port finding as Ray Serve handles networking.
    apply_monkeypatches()
    register_mime_types()
    check_cudnn(logger)

    # Load custom nodes if configured
    if app_config.custom_nodes_path:
        load_custom_nodes(custom_nodes_path=app_config.custom_nodes_path, logger=logger)

    logger.info("InvokeAI App initialized successfully for Ray Serve.")

except ImportError as e:
    print(f"Error importing InvokeAI components: {e}")
    print("Please ensure InvokeAI is installed and accessible in your Python environment.")
    fastapi_app = None
except Exception as e:
    print(f"An error occurred during InvokeAI initialization: {e}")
    fastapi_app = None


# Define the deployment class conditionally only if fastapi_app is available
if fastapi_app:
    @serve.deployment(
        # Each replica will run a full InvokeAI instance.
        num_replicas=1,

    )
    @serve.ingress(fastapi_app)
    class InvokeAIAPIDeployment:
        pass

    # Create the deployment handle
    invoke_ai_deployment = InvokeAIAPIDeployment.bind()
else:
    print("InvokeAI FastAPI app not available. Deployment cannot be created.")
    invoke_ai_deployment = None


def create_app():
    """Create and configure the Ray Serve application."""
    # Start Ray Serve (if not already started)
    serve.start(http_options={"host": "0.0.0.0"})  # Listen on all interfaces

    if invoke_ai_deployment is not None:
        # Deploy the application with the route prefix
        return serve.run(invoke_ai_deployment, route_prefix="/invokeai")
    else:
        raise RuntimeError("Failed to create InvokeAI deployment")


# For running directly (e.g., python serve_invokeai.py)
if __name__ == "__main__":
    create_app()
    # Keep the script running
    import time
    while True:
        time.sleep(1)

# Usage:
# 1. Run with serve CLI (recommended):
#    serve run serve_invokeai.py:invoke_ai_deployment --route-prefix /invokeai
#
# 2. Or run directly (alternative):
#    python serve_invokeai.py
#
# The InvokeAI API will be available at http://localhost:8000/invokeai/...
# The FastAPI docs will be at http://localhost:8000/invokeai/docs

Ray serve lacks support for SocketIO. And afaik there is no interop between SocketIO and websockets. Open to contributions to add this support.

On user affinity (routing request from the same user to same replica) is not natively supported, but in the next release of ray developer can implement this using a custom router, i recommended looking at the release note in 2.47