Log statements missing in docker logs since v1.12

ts-cfield · June 22, 2022, 2:46pm

How severe does this issue affect your experience of using Ray?

Medium-High. I cannot upgrade to latest version of Ray, but I am able to continue with an older version for the time being.

Hello, I am using Ray and Ray Serve with FastAPI. I have been on Ray v1.9.1 and wanted to upgrade to the latest version, Ray v1.13. I am using the Ray docker image with Python v3.8. While using Ray v1.9.1, I was able to write DEBUG, INFO, WARN, ERROR, and FATAL log statements using Python’s logging module in the standard library along with using print() for the occasional debugging message during application development and view the output in the terminal and the docker logs.

After upgrading to Ray v1.13, once the Ray Serve deployment is started, no log or print statements appear in the docker logs or in the terminal. The log statements do appear in the logs from the Ray Dashboard for the various actors, specifically the Ray Serve deployment actor that implements a FastAPI-based HTTP API. However, these log and print statements never appear in the docker logs or the terminal.

I reverted back to Ray v1.12. I observed the same behavior with the missing log and print statements. I reverted back to Ray v1.11 and the log and print statements re-appeared in the docker logs and terminal.

I also tried changing the logging level of the statements to see if it was a logging configuration issue, as many of the “missing log statements” in this forum and on Ray’s GitHub were related to log level configuration. Regardless of the log level and how it is configured, no statements appear for Ray v1.12 or newer.

Something changed between Ray v1.11 and Ray v1.12 with regard to Ray Serve and logging, but I am kind of lost with the changelogs, release notes, and PRs on what exactly changed that would cause the log and print statements to not appear in the docker logs and terminal. Maybe there is some configuration for logging introduced in Ray v1.12 that I am missing and need to change?

I originally posted some additional information in the Workers not sending logs to driver soon enough. Can’t use docker logs topic as that sounded very similar to my issue.

Any information or direction would be greatly appreciated. I recognize the Ray Serve component is under heavy and continued development. Please keep up the great work.

Chen_Shen · June 22, 2022, 7:58pm

hi @ts-cfield this sounds like a regression. To confirm the symptom, this bug only happens after Ray Server is started? cc @eoakes

eoakes · June 22, 2022, 8:18pm

Hi @ts-cfield sorry to hear this, it sounds very frustrating to debug. We did have some minor changes to logging to add an access logger recently, but it isn’t a known change/limitation that logs should no longer show up in the terminal.

Would you mind filing an issue on GitHub including a minimal example that works in 1.11 and doesn’t work in 1.13 (including where you expect the logs to show up)? We will then prioritize diagnosing and fixing the issue. Definitely don’t want you to be stuck on Ray 1.11 or older!

ts-cfield · June 23, 2022, 1:12pm

Log and print statements appear in the terminal and docker logs until the Deployment, i.e. Api.deploy() is called. Once our deployment actor, implemented with FastAPI ingress, is started, the log and print statements disappear.

Example terminal output with Ray v1.11.1 or older (working):

❯ docker logs api
2022-06-23 13:00:16,998 INFO scripts.py:600 -- Local node IP: 192.168.1.98
2022-06-23 13:00:18,747 INFO services.py:1412 -- View the Ray dashboard at http://192.168.1.98:8265
2022-06-23 13:00:19,066 SUCC scripts.py:640 -- --------------------
2022-06-23 13:00:19,066 SUCC scripts.py:641 -- Ray runtime started.
2022-06-23 13:00:19,066 SUCC scripts.py:642 -- --------------------
2022-06-23 13:00:19,066 INFO scripts.py:644 -- Next steps
2022-06-23 13:00:19,066 INFO scripts.py:645 -- To connect to this Ray runtime from another node, run
2022-06-23 13:00:19,066 INFO scripts.py:649 --   ray start --address='192.168.1.98:6379' --redis-password='5241590000000000'
2022-06-23 13:00:19,066 INFO scripts.py:654 -- Alternatively, use the following Python code:
2022-06-23 13:00:19,066 INFO scripts.py:656 -- import ray
2022-06-23 13:00:19,066 INFO scripts.py:660 -- ray.init(address='auto', _redis_password='5241590000000000')
2022-06-23 13:00:19,067 INFO scripts.py:670 -- To connect to this Ray runtime from outside of the cluster, for example to
2022-06-23 13:00:19,067 INFO scripts.py:672 -- connect to a remote cluster from your laptop directly, use the following
2022-06-23 13:00:19,067 INFO scripts.py:674 -- Python code:
2022-06-23 13:00:19,067 INFO scripts.py:676 -- import ray
2022-06-23 13:00:19,067 INFO scripts.py:677 -- ray.init(address='ray://<head_node_ip_address>:10001')
2022-06-23 13:00:19,067 INFO scripts.py:682 -- If connection fails, check your firewall settings and network configuration.
2022-06-23 13:00:19,067 INFO scripts.py:687 -- To terminate the Ray runtime, run
2022-06-23 13:00:19,067 INFO scripts.py:688 --   ray stop
(ServeController pid=224) 2022-06-23 13:00:20,562       INFO checkpoint_path.py:16 -- Using RayInternalKVStore for controller checkpoint and recovery.
(ServeController pid=224) 2022-06-23 13:00:20,677       INFO http_state.py:98 -- Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-node:192.168.1.98-0' on node 'node:192.168.1.98-0' listening on '127.0.0.1:7777'
2022-06-23 13:00:21,329 INFO api.py:521 -- Started detached Serve instance in namespace 'serve'.
(HTTPProxyActor pid=258) INFO:     Started server process [258]
2022-06-23 13:00:24,572 DEBUG deploy:run -- Total Resources = {'node:192.168.1.98': 1.0, 'accelerator_type:RTX': 1.0, 'CPU': 32.0, 'object_store_memory': 40384510771.0, 'memory': 84230525133.0, 'GPU': 2.0}
2022-06-23 13:00:24,573 DEBUG deploy:run -- Available Resources = {'node:192.168.1.98': 0.98, 'GPU': 2.0, 'memory': 84230525133.0, 'CPU': 32.0, 'object_store_memory': 40384510771.0, 'accelerator_type:RTX': 1.0}
2022-06-23 13:00:25,503 INFO api.py:262 -- Updating deployment 'Api'. component=serve deployment=Api
(ServeController pid=224) 2022-06-23 13:00:25,602       INFO deployment_state.py:920 -- Adding 1 replicas to deployment 'Api'. component=serve deployment=Api
(Api pid=335) 2022-06-23 13:00:27,181 DEBUG api:most_recent_field_of_view -- records count=0
(Api pid=335) 2022-06-23 13:00:27,181 INFO  api:most_recent_field_of_view -- Creating a default Field of View (FOV)
(Api pid=335) 2022-06-23 13:00:27,197 DEBUG api:most_recent_scale_bar -- records count=0
(Api pid=335) 2022-06-23 13:00:27,197 INFO  api:most_recent_scale_bar -- Creating a default scale bar
(Api pid=335) 2022-06-23 13:00:27,202 DEBUG api:__init__ -- yolo_v5_weights_dir=/mnt/nvme/theia/YOLO/v5
(Api pid=335) 2022-06-23 13:00:27,202 DEBUG api:__init__ -- file_read_chunk_size=1024
2022-06-23 13:00:27,254 INFO api.py:274 -- Deployment 'Api' is ready at `http://127.0.0.1:7777/api`. component=serve deployment=Api
2022-06-23 13:00:27,276 DEBUG deploy:run -- Deployments = {'Api': Deployment(name=Api,version=None,route_prefix=/api)}

Example terminal output with Ray v1.12.0 or newer (problem):

❯ docker logs api
...
2022-06-23 12:54:38,725 INFO usage_lib.py:320 -- Usage stats collection is enabled by default without user confirmation because this stdin is detected to be non-interactively. To disable this, add `--disable-usage-stats` to the command that starts the cluster, or run the following command: `ray disable-usage-stats` before starting the cluster. See https://docs.ray.io/en/master/cluster/usage-stats.html for more details.
2022-06-23 12:54:38,725 INFO scripts.py:715 -- Local node IP: 192.168.1.98
2022-06-23 12:54:40,577 INFO services.py:1470 -- View the Ray dashboard at http://192.168.1.98:8265
2022-06-23 12:54:40,898 SUCC scripts.py:757 -- --------------------
2022-06-23 12:54:40,899 SUCC scripts.py:758 -- Ray runtime started.
2022-06-23 12:54:40,899 SUCC scripts.py:759 -- --------------------
2022-06-23 12:54:40,899 INFO scripts.py:761 -- Next steps
2022-06-23 12:54:40,899 INFO scripts.py:762 -- To connect to this Ray runtime from another node, run
2022-06-23 12:54:40,899 INFO scripts.py:765 --   ray start --address='192.168.1.98:6379'
2022-06-23 12:54:40,899 INFO scripts.py:770 -- Alternatively, use the following Python code:
2022-06-23 12:54:40,899 INFO scripts.py:772 -- import ray
2022-06-23 12:54:40,899 INFO scripts.py:776 -- ray.init(address='auto')
2022-06-23 12:54:40,899 INFO scripts.py:788 -- To connect to this Ray runtime from outside of the cluster, for example to
2022-06-23 12:54:40,899 INFO scripts.py:792 -- connect to a remote cluster from your laptop directly, use the following
2022-06-23 12:54:40,899 INFO scripts.py:796 -- Python code:
2022-06-23 12:54:40,900 INFO scripts.py:798 -- import ray
2022-06-23 12:54:40,900 INFO scripts.py:799 -- ray.init(address='ray://<head_node_ip_address>:10001')
2022-06-23 12:54:40,900 INFO scripts.py:808 -- If connection fails, check your firewall settings and network configuration.
2022-06-23 12:54:40,900 INFO scripts.py:816 -- To terminate the Ray runtime, run
2022-06-23 12:54:40,900 INFO scripts.py:817 --   ray stop
(ServeController pid=224) INFO 2022-06-23 12:54:42,497 controller 224 checkpoint_path.py:17 - Using RayInternalKVStore for controller checkpoint and recovery.
(ServeController pid=224) INFO 2022-06-23 12:54:42,600 controller 224 http_state.py:112 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-node:192.168.1.98-0' on node 'node:192.168.1.98-0' listening on '127.0.0.1:7777'
(HTTPProxyActor pid=256) INFO:     Started server process [256]
2022-06-23 12:54:46,307 DEBUG deploy:run -- Total Resources = {'accelerator_type:RTX': 1.0, 'object_store_memory': 39170103296.0, 'node:192.168.1.98': 1.0, 'CPU': 32.0, 'GPU': 2.0, 'memory': 85425603584.0}
2022-06-23 12:54:46,308 DEBUG deploy:run -- Available Resources = {'CPU': 32.0, 'node:192.168.1.98': 0.98, 'GPU': 2.0, 'accelerator_type:RTX': 1.0, 'object_store_memory': 39170103296.0, 'memory': 85425603584.0}
2022-06-23 12:54:49,501 DEBUG deploy:run -- Deployments = {'Api': Deployment(name=Api,version=None,route_prefix=/api)}

The container/service is named, “api”. Terminal output, i.e. STDOUT, is the same but there are statements from other services and containers in the multi-container application. The following statements are missing from the output in Ray v1.12 or newer compared to Ray v1.11.1 or older:

(Api pid=335) 2022-06-23 13:00:27,181 DEBUG api:most_recent_field_of_view -- records count=0
(Api pid=335) 2022-06-23 13:00:27,181 INFO  api:most_recent_field_of_view -- Creating a default Field of View (FOV)
(Api pid=335) 2022-06-23 13:00:27,197 DEBUG api:most_recent_scale_bar -- records count=0
(Api pid=335) 2022-06-23 13:00:27,197 INFO  api:most_recent_scale_bar -- Creating a default scale bar
(Api pid=335) 2022-06-23 13:00:27,202 DEBUG api:__init__ -- yolo_v5_weights_dir=/mnt/nvme/theia/YOLO/v5
(Api pid=335) 2022-06-23 13:00:27,202 DEBUG api:__init__ -- file_read_chunk_size=1024

These missing statements are just part of the missing log statements. It is not just the Api actor/deployment. After the Api.deploy() is called, the log statements from any other task or actor are missing from the terminal output and docker logs. These still appear in separate logs in the Ray Dashboard for each task and actor.

The last three DEBUG statements in the Ray v1.12 or newer output from deploy:run are log statements before the Api.deploy() function is called. The deploy.py file:

logger = logging.get_logger(__name__)

def run(context: Configuration):
    logger.debug("Total Resources = %s", ray.cluster_resources())
    logger.debug("Available Resources = %s", ray.available_resources())
    Api.deploy(context)
    logger.debug("Deployments = %s", ray.serve.list_deployments())


if __name__ == "__main__":
    run(server.start())

and the server.py module for the server.start() process. There is a separate deploy.py module so I can setup a file watch and automatically refresh/redeploy/reload on code changes during development.

def start():
    context = cli.run()
    ray.init(address="auto", namespace="serve")
    return context


def run():
    deploy.run(start())

    while True:
        time.sleep(5)


if __name__ == "__main__":
    run()

FYI, I have PYTHONUNBUFFERED=1 in the Dockerfile based on the Ray docker image, regardless of the Ray version.

ts-cfield · June 23, 2022, 1:14pm

Will do. I will create an issue and try to create minimal working example, but it will take me a moment to put together.

ts-cfield · June 23, 2022, 2:14pm

@eoakes I have created the GitHub issue: [Core][Serve] Log Statements Missing in Ray v1.12 or newer · Issue #26039 · ray-project/ray · GitHub with a minimal example that reproduces the problem.

Topic		Replies	Views
Workers not sending logs to driver soon enough. Can't use docker logs Ray Core	7	607	June 22, 2022
Ray is creating hundreds of logs files under /tmp/ray/session_latest/logs/ causing disk space issue and I/O Spikes Ray Serve	7	994	December 17, 2024
[Low] Logging system messages and custom messages to the console Ray Serve	2	462	November 9, 2023
Log inside function in class decorated by deployment does not appear in console Ray Serve	2	15	March 12, 2025
Ray logs - log.debug and log.info don't display Dashboard, Monitoring & Debugging	1	722	March 18, 2022

Log statements missing in docker logs since v1.12

Related topics