Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Mar 27, 2025

What changes were proposed in this pull request?

The purpose of this change is to ensure that the shuffle data files allowing read access for the others to fix shuffle fetch fail.

The error message on NodeManager:

2025-03-25 07:18:11,497 ERROR org.apache.spark.network.server.ChunkFetchRequestHandler: Error sending result ChunkFetchSuccessWithExtraInfo[streamChunkId=StreamChunkId[streamId=1416440634155,chunkIndex=0],buffer=FileSegmentManagedBuffer[file=/hadoop/4/yarn/local/usercache/user/appcache/application_1736396393732_77665/blockmgr-4ce81eec-e4a1-4d47-a3c5-392fa718fc18/33/shuffle_3_2281_0.data,offset=10784763,length=78581]] to /10.18.80.16:49944
java.io.FileNotFoundException: /hadoop/4/yarn/local/usercache/user/appcache/application_1736396393732_77665/blockmgr-4ce81eec-e4a1-4d47-a3c5-392fa718fc18/33/shuffle_3_2281_0.data (Permission denied)
        at java.io.RandomAccessFile.open0(Native Method)
        at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
        at org.apache.spark.network.protocol.ChunkableFileRegion.open(ChunkableFileRegion.java:106)
        at org.apache.spark.network.protocol.ChunkableFileRegion.transferTo(ChunkableFileRegion.java:146)

(Fixes: #9148)

How was this patch tested?

Manual tests

  1. Shuffle fetch success.
  2. File permission is correct, which is -rw-r--r--:
      -rw-r--r-- 1 user user 14M Mar 26 02:00 /hadoop/1/yarn/local/usercache/user/appcache/application_1736396393732_97531/blockmgr-21a22e1a-1c16-4fb3-99d4-97d0dd8600b4/05/shuffle_3_6348_0.data
    

@github-actions github-actions bot added the VELOX label Mar 27, 2025
@github-actions
Copy link

#9148

@wangyum
Copy link
Member Author

wangyum commented Mar 27, 2025

Thanks @zhouyuan and @marin-m for providing the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Shuffle file permission issue when using ColumnarShuffleManager

2 participants