Skip to content

Conversation

@LantaoJin
Copy link
Contributor

@LantaoJin LantaoJin commented Jun 25, 2019

What changes were proposed in this pull request?

This is very like #23590 .

ByteBuffer.allocate may throw OutOfMemoryError when the response is large but no enough memory is available. However, when this happens, TransportClient.sendRpcSync will just hang forever if the timeout set to unlimited.

This PR catches Throwable and uses the error to complete SettableFuture.

How was this patch tested?

I tested in my IDE by setting the value of size to -1 to verify the result. Without this patch, it won't be finished until timeout (May hang forever if timeout set to MAX_INT), or the expected IllegalArgumentException will be caught.

@Override
      public void onSuccess(ByteBuffer response) {
        try {
          int size = response.remaining();
          ByteBuffer copy = ByteBuffer.allocate(size); // set size to -1 in runtime when debug
          copy.put(response);
          // flip "copy" to make it readable
          copy.flip();
          result.set(copy);
        } catch (Throwable t) {
          result.setException(t);
        }
      }

@LantaoJin
Copy link
Contributor Author

cc @zsxwing

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems OK to me; are there other callbacks that might have the same problem?

@LantaoJin
Copy link
Contributor Author

LantaoJin commented Jun 26, 2019

@srowen , I only find one place in ExternalShuffleClient.removeBlocks which maybe has a similar problem (not OOM, just uncaught runtime exception):

public Future<Integer> removeBlocks(
      String host,
      int port,
      String execId,
      String[] blockIds) throws IOException, InterruptedException {
    checkInit();
    CompletableFuture<Integer> numRemovedBlocksFuture = new CompletableFuture<>();
    ByteBuffer removeBlocksMessage = new RemoveBlocks(appId, execId, blockIds).toByteBuffer();
    final TransportClient client = clientFactory.createClient(host, port);
    client.sendRpc(removeBlocksMessage, new RpcResponseCallback() {
      @Override
      public void onSuccess(ByteBuffer response) {
        BlockTransferMessage msgObj = BlockTransferMessage.Decoder.fromByteBuffer(response);
        numRemovedBlocksFuture.complete(((BlocksRemoved)msgObj).numRemovedBlocks);
        client.close();
      }

I prefer to change to below code since fromByteBuffer could throw IllegalArgumentException

      @Override
      public void onSuccess(ByteBuffer response) {
        try {
          BlockTransferMessage msgObj = BlockTransferMessage.Decoder.fromByteBuffer(response);
          numRemovedBlocksFuture.complete(((BlocksRemoved) msgObj).numRemovedBlocks);
        } catch (Exception e) {
          logger.warn("Error trying to remove RDD blocks " + Arrays.toString(blockIds) +
            " via external shuffle service from executor: " + execId, e);
          numRemovedBlocksFuture.complete(0);
        } finally {
          client.close();
        }
      }

@LantaoJin
Copy link
Contributor Author

Should I fix above code if needed in this PR or file a new one?

@srowen
Copy link
Member

srowen commented Jun 26, 2019

I think you can fix the similar issue here, and update the title/description.

@LantaoJin LantaoJin changed the title [SPARK-28160][Core] Fix a bug that TransportClient.sendRpcSync may hang forever [SPARK-28160][CORE] Fix a bug that callback function may hang when unchecked exception missed Jun 26, 2019
try {
BlockTransferMessage msgObj = BlockTransferMessage.Decoder.fromByteBuffer(response);
numRemovedBlocksFuture.complete(((BlocksRemoved) msgObj).numRemovedBlocks);
} catch (Exception e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to catch Throwable here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven’t seen any error could occur here, so I use Exception. Throwable of course is fine.

// flip "copy" to make it readable
copy.flip();
result.set(copy);
} catch (Throwable t) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to log the throwable here just for completeness?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a warning log.

@LantaoJin
Copy link
Contributor Author

Gentle ping @srowen

@SparkQA
Copy link

SparkQA commented Jun 30, 2019

Test build #4813 has finished for PR 24964 at commit d2330cc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

srowen pushed a commit that referenced this pull request Jun 30, 2019
…checked exception missed

This is very like #23590 .

`ByteBuffer.allocate` may throw `OutOfMemoryError` when the response is large but no enough memory is available. However, when this happens, `TransportClient.sendRpcSync` will just hang forever if the timeout set to unlimited.

This PR catches `Throwable` and uses the error to complete `SettableFuture`.

I tested in my IDE by setting the value of size to -1 to verify the result. Without this patch, it won't be finished until timeout (May hang forever if timeout set to MAX_INT), or the expected `IllegalArgumentException` will be caught.
```java
Override
      public void onSuccess(ByteBuffer response) {
        try {
          int size = response.remaining();
          ByteBuffer copy = ByteBuffer.allocate(size); // set size to -1 in runtime when debug
          copy.put(response);
          // flip "copy" to make it readable
          copy.flip();
          result.set(copy);
        } catch (Throwable t) {
          result.setException(t);
        }
      }
```

Closes #24964 from LantaoJin/SPARK-28160.

Lead-authored-by: LantaoJin <[email protected]>
Co-authored-by: lajin <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 0e42100)
Signed-off-by: Sean Owen <[email protected]>
@srowen srowen closed this in 0e42100 Jun 30, 2019
srowen pushed a commit that referenced this pull request Jun 30, 2019
…checked exception missed

This is very like #23590 .

`ByteBuffer.allocate` may throw `OutOfMemoryError` when the response is large but no enough memory is available. However, when this happens, `TransportClient.sendRpcSync` will just hang forever if the timeout set to unlimited.

This PR catches `Throwable` and uses the error to complete `SettableFuture`.

I tested in my IDE by setting the value of size to -1 to verify the result. Without this patch, it won't be finished until timeout (May hang forever if timeout set to MAX_INT), or the expected `IllegalArgumentException` will be caught.
```java
Override
      public void onSuccess(ByteBuffer response) {
        try {
          int size = response.remaining();
          ByteBuffer copy = ByteBuffer.allocate(size); // set size to -1 in runtime when debug
          copy.put(response);
          // flip "copy" to make it readable
          copy.flip();
          result.set(copy);
        } catch (Throwable t) {
          result.setException(t);
        }
      }
```

Closes #24964 from LantaoJin/SPARK-28160.

Lead-authored-by: LantaoJin <[email protected]>
Co-authored-by: lajin <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 0e42100)
Signed-off-by: Sean Owen <[email protected]>
@srowen
Copy link
Member

srowen commented Jun 30, 2019

Merged to master/2.4/2.3

Tonix517 pushed a commit to Tonix517/spark that referenced this pull request Jul 3, 2019
…checked exception missed

## What changes were proposed in this pull request?

This is very like apache#23590 .

`ByteBuffer.allocate` may throw `OutOfMemoryError` when the response is large but no enough memory is available. However, when this happens, `TransportClient.sendRpcSync` will just hang forever if the timeout set to unlimited.

This PR catches `Throwable` and uses the error to complete `SettableFuture`.

## How was this patch tested?

I tested in my IDE by setting the value of size to -1 to verify the result. Without this patch, it won't be finished until timeout (May hang forever if timeout set to MAX_INT), or the expected `IllegalArgumentException` will be caught.
```java
Override
      public void onSuccess(ByteBuffer response) {
        try {
          int size = response.remaining();
          ByteBuffer copy = ByteBuffer.allocate(size); // set size to -1 in runtime when debug
          copy.put(response);
          // flip "copy" to make it readable
          copy.flip();
          result.set(copy);
        } catch (Throwable t) {
          result.setException(t);
        }
      }
```

Closes apache#24964 from LantaoJin/SPARK-28160.

Lead-authored-by: LantaoJin <[email protected]>
Co-authored-by: lajin <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
rluta pushed a commit to rluta/spark that referenced this pull request Sep 17, 2019
…checked exception missed

This is very like apache#23590 .

`ByteBuffer.allocate` may throw `OutOfMemoryError` when the response is large but no enough memory is available. However, when this happens, `TransportClient.sendRpcSync` will just hang forever if the timeout set to unlimited.

This PR catches `Throwable` and uses the error to complete `SettableFuture`.

I tested in my IDE by setting the value of size to -1 to verify the result. Without this patch, it won't be finished until timeout (May hang forever if timeout set to MAX_INT), or the expected `IllegalArgumentException` will be caught.
```java
Override
      public void onSuccess(ByteBuffer response) {
        try {
          int size = response.remaining();
          ByteBuffer copy = ByteBuffer.allocate(size); // set size to -1 in runtime when debug
          copy.put(response);
          // flip "copy" to make it readable
          copy.flip();
          result.set(copy);
        } catch (Throwable t) {
          result.setException(t);
        }
      }
```

Closes apache#24964 from LantaoJin/SPARK-28160.

Lead-authored-by: LantaoJin <[email protected]>
Co-authored-by: lajin <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 0e42100)
Signed-off-by: Sean Owen <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Sep 26, 2019
…checked exception missed

This is very like apache#23590 .

`ByteBuffer.allocate` may throw `OutOfMemoryError` when the response is large but no enough memory is available. However, when this happens, `TransportClient.sendRpcSync` will just hang forever if the timeout set to unlimited.

This PR catches `Throwable` and uses the error to complete `SettableFuture`.

I tested in my IDE by setting the value of size to -1 to verify the result. Without this patch, it won't be finished until timeout (May hang forever if timeout set to MAX_INT), or the expected `IllegalArgumentException` will be caught.
```java
Override
      public void onSuccess(ByteBuffer response) {
        try {
          int size = response.remaining();
          ByteBuffer copy = ByteBuffer.allocate(size); // set size to -1 in runtime when debug
          copy.put(response);
          // flip "copy" to make it readable
          copy.flip();
          result.set(copy);
        } catch (Throwable t) {
          result.setException(t);
        }
      }
```

Closes apache#24964 from LantaoJin/SPARK-28160.

Lead-authored-by: LantaoJin <[email protected]>
Co-authored-by: lajin <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 0e42100)
Signed-off-by: Sean Owen <[email protected]>
cfmcgrady pushed a commit to apache/celeborn that referenced this pull request Feb 23, 2024
…d exception missed

### What changes were proposed in this pull request?
Refer: [SPARK-28160](https://issues.apache.org/jira/browse/SPARK-28160) / apache/spark#24964
ByteBuffer.allocate may throw OutOfMemoryError when the response is large but no enough memory is available. However, when this happens, TransportClient.sendRpcSync will just hang forever if the timeout set to unlimited.

### Why are the changes needed?
To catch the exception of `ByteBuffer.allocate` in corner case.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Quote the local test in apache/spark#24964
```
I tested in my IDE by setting the value of size to -1 to verify the result. Without this patch, it won't be finished until timeout (May hang forever if timeout set to MAX_INT), or the expected IllegalArgumentException will be caught.

      Override
      public void onSuccess(ByteBuffer response) {
        try {
          int size = response.remaining();
          ByteBuffer copy = ByteBuffer.allocate(size); // set size to -1 in runtime when debug
          copy.put(response);
          // flip "copy" to make it readable
          copy.flip();
          result.set(copy);
        } catch (Throwable t) {
          result.setException(t);
        }
      }
```

Closes #2316 from turboFei/fix_transport_client_onsucess.

Authored-by: Fei Wang <[email protected]>
Signed-off-by: chenfu <[email protected]>
cfmcgrady pushed a commit to apache/celeborn that referenced this pull request Feb 23, 2024
…d exception missed

### What changes were proposed in this pull request?
Refer: [SPARK-28160](https://issues.apache.org/jira/browse/SPARK-28160) / apache/spark#24964
ByteBuffer.allocate may throw OutOfMemoryError when the response is large but no enough memory is available. However, when this happens, TransportClient.sendRpcSync will just hang forever if the timeout set to unlimited.

### Why are the changes needed?
To catch the exception of `ByteBuffer.allocate` in corner case.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Quote the local test in apache/spark#24964
```
I tested in my IDE by setting the value of size to -1 to verify the result. Without this patch, it won't be finished until timeout (May hang forever if timeout set to MAX_INT), or the expected IllegalArgumentException will be caught.

      Override
      public void onSuccess(ByteBuffer response) {
        try {
          int size = response.remaining();
          ByteBuffer copy = ByteBuffer.allocate(size); // set size to -1 in runtime when debug
          copy.put(response);
          // flip "copy" to make it readable
          copy.flip();
          result.set(copy);
        } catch (Throwable t) {
          result.setException(t);
        }
      }
```

Closes #2316 from turboFei/fix_transport_client_onsucess.

Authored-by: Fei Wang <[email protected]>
Signed-off-by: chenfu <[email protected]>
(cherry picked from commit 387bffc)
Signed-off-by: chenfu <[email protected]>
cfmcgrady pushed a commit to apache/celeborn that referenced this pull request Feb 23, 2024
…d exception missed

### What changes were proposed in this pull request?
Refer: [SPARK-28160](https://issues.apache.org/jira/browse/SPARK-28160) / apache/spark#24964
ByteBuffer.allocate may throw OutOfMemoryError when the response is large but no enough memory is available. However, when this happens, TransportClient.sendRpcSync will just hang forever if the timeout set to unlimited.

### Why are the changes needed?
To catch the exception of `ByteBuffer.allocate` in corner case.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Quote the local test in apache/spark#24964
```
I tested in my IDE by setting the value of size to -1 to verify the result. Without this patch, it won't be finished until timeout (May hang forever if timeout set to MAX_INT), or the expected IllegalArgumentException will be caught.

      Override
      public void onSuccess(ByteBuffer response) {
        try {
          int size = response.remaining();
          ByteBuffer copy = ByteBuffer.allocate(size); // set size to -1 in runtime when debug
          copy.put(response);
          // flip "copy" to make it readable
          copy.flip();
          result.set(copy);
        } catch (Throwable t) {
          result.setException(t);
        }
      }
```

Closes #2316 from turboFei/fix_transport_client_onsucess.

Authored-by: Fei Wang <[email protected]>
Signed-off-by: chenfu <[email protected]>
(cherry picked from commit 387bffc)
Signed-off-by: chenfu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants