Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -565,8 +565,9 @@ private[spark] class BlockManager(
// Give up trying anymore locations. Either we've tried all of the original locations,
// or we've refreshed the list of locations from the master, and have still
// hit failures after trying locations from the refreshed list.
throw new BlockFetchException(s"Failed to fetch block after" +
s" ${totalFailureCount} fetch failures. Most recent failure cause:", e)
logError(s"Failed to fetch block after $totalFailureCount fetch failures." +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a block is not found at a particular location, I take it that's a non-fatal error ? In that case, should we really be logging "error" if the block is not found ? Or do we expect the block to be found somewhere remotely during the normal course of execution ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log statement corresponds to the case where we've tried to fetch the block from all known locations (or maxFetchFailures of them) and all of those fetches have failed with errors. Failures to fetch from only one location are logged at warning level a few lines down from here.

This is probably worth downgrading to a warning to prevent users from becoming confused: I've noticed that people sometimes report the first error that they find in a log as the cause of a job failure even when the real failure occurs elsewhere. Warning level will also help filter this out in automated log analysis to search for errors.

s"Most recent failure cause:", e)
return None
}

logWarning(s"Failed to fetch remote block $blockId " +
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -513,10 +513,8 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
assert(store.getRemoteBytes("list1").isDefined, "list1Get expected to be fetched")
store3.stop()
store3 = null
// exception throw because there is no locations
intercept[BlockFetchException] {
store.getRemoteBytes("list1")
}
// Should return None instead of throwing an exception:
assert(store.getRemoteBytes("list1").isEmpty)
}

test("SPARK-14252: getOrElseUpdate should still read from remote storage") {
Expand Down Expand Up @@ -1186,9 +1184,7 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE
new MockBlockTransferService(conf.getInt("spark.block.failures.beforeLocationRefresh", 5))
store = makeBlockManager(8000, "executor1", transferService = Option(mockBlockTransferService))
store.putSingle("item", 999L, StorageLevel.MEMORY_ONLY, tellMaster = true)
intercept[BlockFetchException] {
store.getRemoteBytes("item")
}
assert(store.getRemoteBytes("item").isEmpty)
}

test("SPARK-13328: refresh block locations (fetch should succeed after location refresh)") {
Expand Down