We have simulations running in Matlab and would like to serve predictions using Ray Serve. The simulations are triggered from python code using Matlab Engine API for Python, which allows simulations to be executed in parallel using Ray. Function calls from Python to Matlab code using the Matlab Engine can return data structures, but in this case we would like to make queries to Ray Serve from within the simulations and then continue to run the next step of the simulation. In the case of a python simulator we could pass the Serve handle to the simulation function and the serving would be done seamlessly. For Matlab simulators (“external environments”) the main procedure to serve predictions would be from HTTP requests. However, for short simulations (sub-second) HTTP requests add too much overhead to the simulation time.
Are there any alternative solutions to HTTP requests? Seems Apache Arrow could be an alternative, but support is currently limited in Matlab. Perhaps we could do a “zero copy” operation between the matlab process and a Serve “Router Actor”? Using the Matlab engine API is also a limiting factor for what functionality that is supported. Any ideas around this is appreciated.
To clarify, is this your setup:
- Python Serve deployments that call Matlab simulation code
- The Matlab simulation code then makes calls to other Serve deployments
Is the problem that there’s no clear way to call Python code from Matlab in step 2?
- Initiate a Python Serve deployment.
- Python actors executes Matlab simulation code.
- Matlab simulation code then calls Python serve deployment.
Yes, you could say that the problem is that there is no clear way to call Python code from Matlab. Like the way you can pass the serve deployment handle to a function in Python, leveraging the communication of Ray for passing values between machines in the cluster. Triggering a million http requests on a Kubernetes cluster does not seem to be sustainable.
there is no clear way to call Python code from Matlab
I’m not too familiar with matlab but if this cannot be done then you can’t use serve handle to proceed here.
Triggering million of requests should be fine as long as the concurrent queries per second is under control. A single Serve HTTP Server can handle 1.5k queries per second and it can be horizontally scaled up.
Would need to do proper profiling, but when running similar simulations written in Python and passing the Serve handle to do prediction queries the total simulation time is 10-20% of the total simulation time when doing the same prediction querier using HTTP requests. Seems there is additional overhead that adds substantially to the total simulation time.
Are there any alternatives to HTTP requests supported? Possibly one can do a custom solution, but would be nice to leverage existing Ray components.
We do support hosting any web server (http, tcp, grpc, arrow flight) within the serve replica to handle these! After all, you can run arbitrary Python code, including web server hosting.
Thanks for quick answer, will have a look at hosting alternative web servers to see how that impacts total simulation time.