Skip to content

Conversation

@SteNicholas
Copy link
Member

@SteNicholas SteNicholas commented Oct 28, 2025

What changes were proposed in this pull request?

Support native kqueue transport on BSD/MacOS for celeborn.<module>.io.mode.

Backport:

Why are the changes needed?

Netty provides the following platform specific JNI transports for native transports:

  • Linux (since 4.0.16)
  • MacOS/BSD (since 4.1.11)

These JNI transports add features specific to a particular platform, generate less garbage, and generally improve performance when compared to the NIO based transport.

Does this PR introduce any user-facing change?

Change the default value of celeborn.<module>.io.mode from NIO to EPOLL if epoll mode is available, from NIO to KQUEUE if kqueue mode is available, falling back to NIO otherwise.

How was this patch tested?

CI.

@SteNicholas
Copy link
Member Author

SteNicholas commented Oct 28, 2025

Ping @pan3793, @turboFei, @cxzl25.

@SteNicholas SteNicholas requested a review from pan3793 October 28, 2025 08:28
@pan3793
Copy link
Member

pan3793 commented Oct 28, 2025

Code change LGTM, could you check the packaging side? IIRC, we excluded kqueue libs from the shaded client jars.

Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only a nits.

But it's better to have a manual test (package the shaded client and run a real Spark/Flink jobs, on macOS) before getting in.

@SteNicholas
Copy link
Member Author

@pan3793, thanks for review. I have verified the packaging via ./build/make-distribution.sh -Pspark-3.5, which generates the native folder for celeborn-client-spark-3-shaded_2.12-0.7.0-SNAPSHOT.jar.

$ jar -xvf celeborn-client-spark-3-shaded_2.12-0.7.0-SNAPSHOT.jar
...
$ ls META-INF/native/
libnetty_transport_native_epoll_riscv64.so					liborg_apache_celeborn_shaded_netty_transport_native_epoll_aarch_64.so		liborg_apache_celeborn_shaded_netty_transport_native_kqueue_x86_64.jnilib
liborg_apache_celeborn_shaded_netty_resolver_dns_native_macos_aarch_64.jnilib	liborg_apache_celeborn_shaded_netty_transport_native_epoll_x86_64.so
liborg_apache_celeborn_shaded_netty_resolver_dns_native_macos_x86_64.jnilib	liborg_apache_celeborn_shaded_netty_transport_native_kqueue_aarch_64.jnilib
25/10/28 20:14:18,582 INFO [main] SignalUtils: Registering signal handler for TERM
25/10/28 20:14:18,625 INFO [main] SignalUtils: Registering signal handler for HUP
25/10/28 20:14:18,626 INFO [main] SignalUtils: Registering signal handler for INT
25/10/28 20:14:24,231 WARN [main] JVMSource: Add gauge jvm.thread.deadlocks failed, the value type class java.util.Collections$EmptySet is not a number
25/10/28 20:14:24,297 INFO [main] Dispatcher: Celeborn dispatcher numThreads: 64
25/10/28 20:14:24,330 INFO [main] TransportContext: SSL not enabled for module = rpc_service
25/10/28 20:14:25,319 INFO [main] TransportClientFactory: Module rpc_service mode KQUEUE threads 64
25/10/28 20:14:25,364 INFO [main] NettyRpcEnvFactory: Starting RPC Server [Master] on 30.177.49.196:9097 with advertised endpoint 30.177.49.196:9097
25/10/28 20:14:30,480 INFO [main] Utils: Successfully started service 'Master' on port 9097.
25/10/28 20:14:30,690 INFO [main] Master: Metrics system enabled.
25/10/28 20:14:30,724 INFO [main] log: Logging initialized @18339ms to org.eclipse.jetty.util.log.Slf4jLog
25/10/28 20:14:31,908 INFO [main] ContextHandler: Started o.e.j.s.ServletContextHandler@29a23c3d{/,null,AVAILABLE}
25/10/28 20:14:31,910 WARN [main] ContextHandler: o.e.j.s.ServletContextHandler@4fdca00a{/,null,STOPPED} contextPath ends with /
25/10/28 20:14:31,911 INFO [main] ContextHandler: Started o.e.j.s.ServletContextHandler@4fdca00a{/swagger-static,null,AVAILABLE}
25/10/28 20:14:31,911 INFO [main] ContextHandler: Started o.e.j.s.ServletContextHandler@5a8c93{/swagger,null,AVAILABLE}
25/10/28 20:14:31,912 INFO [main] ContextHandler: Started o.e.j.s.ServletContextHandler@119b0892{/help,null,AVAILABLE}
25/10/28 20:14:31,912 INFO [main] ContextHandler: Started o.e.j.s.ServletContextHandler@4ed4a7e4{/docs,null,AVAILABLE}
25/10/28 20:14:31,919 INFO [main] Master: Adding metrics servlet handler with path /metrics/prometheus
25/10/28 20:14:31,920 INFO [main] ContextHandler: Started o.e.j.s.ServletContextHandler@4086d8fb{/metrics/prometheus,null,AVAILABLE}
25/10/28 20:14:31,920 INFO [main] Master: Adding metrics servlet handler with path /metrics/json
25/10/28 20:14:31,920 INFO [main] ContextHandler: Started o.e.j.s.ServletContextHandler@66236a0a{/metrics/json,null,AVAILABLE}
25/10/28 20:14:31,921 INFO [main] Server: jetty-9.4.56.v20240826; built: 2024-08-26T17:15:05.868Z; git: ec6782ff5ead824dabdcf47fa98f90a4aedff401; jvm 1.8.0_202-b08
25/10/28 20:14:31,931 INFO [main] Server: Started @19548ms
25/10/28 20:14:31,942 INFO [main] AbstractConnector: Started ServerConnector@1ee47d9e{HTTP/1.1, (http/1.1)}{30.177.49.196:9098}
25/10/28 20:14:31,943 INFO [main] HttpServer: master: HttpServer started on 30.177.49.196:9098.
25/10/28 20:14:31,944 INFO [main] Master: Master started.
25/10/28 20:15:35,740 INFO [celeborn-dispatcher-7] Master: Registered worker 
Host: 30.177.49.196
RpcPort: 55893
PushPort: 55894
FetchPort: 55896
ReplicatePort: 55895
InternalPort: 55893
SlotsUsed: 0
LastHeartbeat: 0
HeartbeatElapsedSeconds: 1761653735
Disks: 
  DiskInfo0: DiskInfo(maxSlots: 7366, availableSlots: 4166, committed shuffles 0, running applications 0, mountPoint: /, usableSpace: 260.4 GiB, totalSpace: 460.4 GiB, avgFlushTime: 0 ns, avgFetchTime: 0 ns, activeSlots: 0, storageType: HDD) status: HEALTHY dirs  shuffleAllocations: Map()
UserResourceConsumption: empty
WorkerRef: null
WorkerStatus: WorkerStatus{state=Normal, stateStartTime=1761653735704}
NetworkLocation: /default-rack
NextInterruptionNotice: None
.
25/10/28 20:19:30,693 INFO [master-partition-size-updater] Master: Cluster estimate partition size 64.0 MiB
25/10/28 20:29:30,711 INFO [master-partition-size-updater] Master: Cluster estimate partition size 64.0 MiB
25/10/28 20:36:51,379 INFO [celeborn-dispatcher-28] Master: Unregister shuffle 1761654853643-ee5683af89d367555a3089ff88f0bdd8-0
25/10/28 20:36:51,392 INFO [celeborn-dispatcher-29] Master: Unregister shuffle 1761654853643-ee5683af89d367555a3089ff88f0bdd8-1
25/10/28 20:39:30,733 INFO [master-partition-size-updater] Master: Cluster estimate partition size 64.0 MiB
25/10/28 20:42:55,494 INFO [celeborn-dispatcher-27] Master: Unregister shuffle 1761655214105-5f7c7a8bb9eb8aea1f39663cb8423189-0
25/10/28 20:42:55,506 INFO [celeborn-dispatcher-28] Master: Unregister shuffle 1761655214105-5f7c7a8bb9eb8aea1f39663cb8423189-1
25/10/28 20:43:55,496 INFO [celeborn-dispatcher-37] Master: Unregister shuffle 1761655214105-5f7c7a8bb9eb8aea1f39663cb8423189-2
25/10/28 20:43:55,511 INFO [celeborn-dispatcher-38] Master: Unregister shuffle 1761655214105-5f7c7a8bb9eb8aea1f39663cb8423189-3
25/10/28 20:44:20,785 WARN [celeborn-dispatcher-45] TagsManager: No tags provided
25/10/28 20:44:20,813 INFO [celeborn-dispatcher-45] Master: Successfully offered slots for 1 reducers of 1761655214105-5f7c7a8bb9eb8aea1f39663cb8423189-4 on 1 workers.
25/10/28 20:44:30,809 WARN [celeborn-dispatcher-49] Master: Application 1761654853643-ee5683af89d367555a3089ff88f0bdd8 timeout, trigger applicationLost event.
25/10/28 20:44:30,815 INFO [master-noneager-handler-0] Master: Removed application 1761654853643-ee5683af89d367555a3089ff88f0bdd8
25/10/28 20:45:55,520 INFO [celeborn-dispatcher-61] Master: Unregister shuffle 1761655214105-5f7c7a8bb9eb8aea1f39663cb8423189-4
25/10/28 20:49:30,748 INFO [master-partition-size-updater] Master: Cluster estimate partition size 64.0 MiB
25/10/28 20:50:12,364 INFO [celeborn-dispatcher-40] Master: Unregister shuffle 1761655631769-7cb65d4ac219c0d27851bc7fdddd0245-0
25/10/28 20:52:00,826 WARN [celeborn-dispatcher-54] Master: Application 1761655214105-5f7c7a8bb9eb8aea1f39663cb8423189 timeout, trigger applicationLost event.
25/10/28 20:52:00,835 INFO [master-noneager-handler-1] Master: Removed application 1761655214105-5f7c7a8bb9eb8aea1f39663cb8423189
25/10/28 20:57:00,840 WARN [celeborn-dispatcher-3] Master: Application 1761655631769-7cb65d4ac219c0d27851bc7fdddd0245 timeout, trigger applicationLost event.
25/10/28 20:57:00,845 INFO [master-noneager-handler-2] Master: Removed application 1761655631769-7cb65d4ac219c0d27851bc7fdddd0245
25/10/28 20:59:30,762 INFO [master-partition-size-updater] Master: Cluster estimate partition size 64.0 MiB
25/10/28 21:00:44,393 WARN [celeborn-dispatcher-42] TagsManager: No tags provided
25/10/28 21:00:44,396 INFO [celeborn-dispatcher-42] Master: Successfully offered slots for 1 reducers of 1761656290656-706ab0b05e029b488a48ca824a735696-3 on 1 workers.
25/10/28 21:00:47,892 INFO [celeborn-dispatcher-43] Master: Unregister shuffle 1761656290656-706ab0b05e029b488a48ca824a735696-0
25/10/28 21:01:47,889 INFO [celeborn-dispatcher-54] Master: Unregister shuffle 1761656290656-706ab0b05e029b488a48ca824a735696-1
25/10/28 21:01:47,911 INFO [celeborn-dispatcher-55] Master: Unregister shuffle 1761656290656-706ab0b05e029b488a48ca824a735696-2
25/10/28 21:01:47,922 INFO [celeborn-dispatcher-53] Master: Unregister shuffle 1761656290656-706ab0b05e029b488a48ca824a735696-3
25/10/28 21:09:30,777 INFO [master-partition-size-updater] Master: Cluster estimate partition size 64.0 MiB
25/10/28 21:09:30,865 WARN [celeborn-dispatcher-23] Master: Application 1761656290656-706ab0b05e029b488a48ca824a735696 timeout, trigger applicationLost event.
25/10/28 21:09:30,884 INFO [master-noneager-handler-3] Master: Removed application 1761656290656-706ab0b05e029b488a48ca824a735696
tHandler@2dd8ff1d{/,null,STOPPED} contextPath ends with /
25/10/28 20:15:35,443 INFO [main] ContextHandler: Started o.e.j.s.ServletContextHandler@2dd8ff1d{/swagger-static,null,AVAILABLE}
25/10/28 20:15:35,443 INFO [main] ContextHandler: Started o.e.j.s.ServletContextHandler@36b9cb99{/swagger,null,AVAILABLE}
25/10/28 20:15:35,443 INFO [main] ContextHandler: Started o.e.j.s.ServletContextHandler@2bfaba70{/help,null,AVAILABLE}
25/10/28 20:15:35,444 INFO [main] ContextHandler: Started o.e.j.s.ServletContextHandler@9301672{/docs,null,AVAILABLE}
25/10/28 20:15:35,451 INFO [main] Worker: Adding metrics servlet handler with path /metrics/prometheus
25/10/28 20:15:35,451 INFO [main] ContextHandler: Started o.e.j.s.ServletContextHandler@7fd987ef{/metrics/prometheus,null,AVAILABLE}
25/10/28 20:15:35,451 INFO [main] Worker: Adding metrics servlet handler with path /metrics/json
25/10/28 20:15:35,451 INFO [main] ContextHandler: Started o.e.j.s.ServletContextHandler@7209ffb5{/metrics/json,null,AVAILABLE}
25/10/28 20:15:35,452 INFO [main] Server: jetty-9.4.56.v20240826; built: 2024-08-26T17:15:05.868Z; git: ec6782ff5ead824dabdcf47fa98f90a4aedff401; jvm 1.8.0_202-b08
25/10/28 20:15:35,464 INFO [main] Server: Started @19824ms
25/10/28 20:15:35,479 INFO [main] AbstractConnector: Started ServerConnector@57bbebe7{HTTP/1.1, (http/1.1)}{30.177.49.196:9096}
25/10/28 20:15:35,483 INFO [main] HttpServer: worker: HttpServer started on 30.177.49.196:9096.
25/10/28 20:15:35,485 INFO [main] Worker: Starting Worker 30.177.49.196:55894:55896:55895 with {/=DiskInfo(maxSlots: 0, availableSlots: 0, committed shuffles 0, running applications 0, mountPoint: /, usableSpace: 260.4 GiB, totalSpace: 460.4 GiB, avgFlushTime: 0 ns, avgFetchTime: 0 ns, activeSlots: 0, storageType: HDD) status: HEALTHY dirs /private/tmp/celeborn_worker/celeborn-worker/shuffle_data shuffleAllocations: Map()} slots.
25/10/28 20:15:35,642 INFO [main] MasterClient: connect to master 30.177.49.196:9097.
25/10/28 20:15:35,763 INFO [main] Worker: Register worker successfully.
25/10/28 20:15:35,771 INFO [main] Worker: Worker started.
2025-10-28 21:10:50,855 INFO  org.apache.celeborn.plugin.flink.RemoteShuffleMaster         [] - Register job: ec0e0012f46b3f03a5371d637584be00.
2025-10-28 21:10:50,856 INFO  org.apache.celeborn.plugin.flink.RemoteShuffleMaster         [] - CelebornAppId: 1761657018509-ec0e0012f46b3f03a5371d637584be00
2025-10-28 21:10:56,005 INFO  org.apache.celeborn.common.rpc.netty.Dispatcher              [] - Celeborn dispatcher numThreads: 32
2025-10-28 21:10:56,021 INFO  org.apache.celeborn.common.network.TransportContext          [] - SSL not enabled for module = rpc_app_lifecyclemanager
2025-10-28 21:10:56,749 INFO  org.apache.celeborn.common.network.client.TransportClientFactory [] - Module rpc_app_lifecyclemanager mode KQUEUE threads 14
2025-10-28 21:10:56,783 INFO  org.apache.celeborn.common.rpc.netty.NettyRpcEnvFactory      [] - Starting RPC Server [LifecycleManager] on 30.177.49.196:0 with advertised endpoint 30.177.49.196:0
2025-10-28 21:11:01,877 INFO  org.apache.celeborn.common.util.Utils                        [] - Successfully started service 'LifecycleManager' on port 59322.
2025-10-28 21:11:01,881 INFO  org.apache.celeborn.client.LifecycleManager                  [] - Starting LifecycleManager on 30.177.49.196:59322
2025-10-28 21:11:01,894 INFO  org.apache.celeborn.common.client.StaticMasterEndpointResolver [] - masterEndpoints = List(30.177.49.196:9097)
2025-10-28 21:11:01,961 INFO  org.apache.celeborn.client.ApplicationHeartbeater            [] - Send app heartbeat with written: 0.0 B, file count: 0, shuffle count: 0, shuffle fallback counts: Map(), application count: 1, application fallback counts: Map()

@SteNicholas SteNicholas requested a review from pan3793 October 28, 2025 13:17
@SteNicholas
Copy link
Member Author

Ping @cxzl25.

@SteNicholas SteNicholas requested a review from cxzl25 October 29, 2025 03:14
@SteNicholas
Copy link
Member Author

Merged to main(v0.7.0).

@cfmcgrady
Copy link
Contributor

Code change LGTM, could you check the packaging side? IIRC, we excluded kqueue libs from the shaded client jars.

@SteNicholas kqueue has also been excluded in the sbt build, so the sbt configuration should be updated accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants