Request for Ray to support hugepages in its memory regions (e.g. plasma object store) by using MADV_HUGEPAGE

shashank_tavil · August 11, 2025, 7:40pm

1. Severity of the issue: (select one): MEDIUM

2. Environment:

3. What happened vs. what you expected:

Expected:
- When running in THP madvise mode, Ray should be able to use hugepages for its large memory regions (plasma object store, Arrow/mmapped buffers) and see the same vCPU and page fault benefits as seen in always mode.
- This would allow us to run Ray jobs safely in multi-tenant environments, using system default (madvise) with no performance penalty.
Actual:
- With THP in madvise mode, less than 1% of the Plasma Object storage, , Arrow/mmapped buffers was backed by Huge pages.

4. My ask

Please add MADV_HUGEPAGE (madvise(..., MADV_HUGEPAGE...)) to the plasma object store and large mmap regions in Ray/Arrow.

This change would enable Ray memory regions to be hugepage-eligible under THP madvise, letting us use production-safe THP settings and still get massive system CPU and performance wins.

Supporting results:

With THP=always: Object store system CPU dropped ~50% (eg. 8 vCPUs to 4 vCPUs), page faults fell from ~2M/sec to 100K/sec, and AnonHugePages: 150,000,000 kB (over 150GB) for plasma .
With THP=madvise: System CPU dropped only 5% (P95: 65% to 60%), faults only dropped to 1.5M/sec, and AnonHugePages: 18,432 kB (18 MB)—almost no plasma coverage.
Root cause: Ray and PyArrow do not call MADV_HUGEPAGE. Only heap allocations (not plasma/Arrow mmap) see / benefit from hugepages in madvise mode.

This fix would:

Let us run Ray (even for 100GB–200GB object store jobs) in production using madvise, with large CPU and cost savings, and without risky global THP toggling or complex node isolation.
Align Ray behavior with other data frameworks and emerging best practices for memory-mapped workloads in Linux.

Topic		Replies	Views
How to enable huge pages in ray Ray Core	2	499	January 6, 2022
ValueError: Attempting to cap object store memory usage at 67108864 bytes, but the minimum allowed is 78643200 bytes Ray Core	4	2951	March 20, 2021
Ray/Plasma backed array	15	1270	March 8, 2021
Why is Ray spilling objects to disk even though there is enough memory Ray Core	6	972	January 19, 2021
Docker. Using /tmp instead of /dev/shm because /dev/shm has only 31457280000 bytes available Ray Core	6	4972	September 6, 2022