Call commitNull in SimpleFunctionAdapter on exceptions (take 2)#10418
Closed
kevinwilfong wants to merge 1 commit intofacebookincubator:mainfrom
Closed
Call commitNull in SimpleFunctionAdapter on exceptions (take 2)#10418kevinwilfong wants to merge 1 commit intofacebookincubator:mainfrom
kevinwilfong wants to merge 1 commit intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D59473869 |
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D59473869 |
3bfbdb4 to
8852625
Compare
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D59473869 |
…bookincubator#10418) Summary: Pull Request resolved: facebookincubator#10418 The way the VectorWriters work today, if its writing a variable length type and it is not committed (e.g. because an exception was thrown) when the next value is written it will start with the state of the previous value rather than a clean slate. This can result in e.g. strings starting with the contents that were written for the previous string. SimpleFunctionAdapter tried to compensate for this by making a local copy of the top level VectorWriter and only copying back into the original if processing the current row succeeds. This does nothing for nested writers (it also wasn't implemented for Strings). To fix this, I've added an optional lambda to applyToSelectedNoThrow that gets invoked when an exception is caught. We can use this to call commitNull on the writer which should reset the state of all writers (top level and nested). Note that if we're catching exceptions and not throwing anything we must be in a try so committing null is safe and reasonable to do. This shouldn't impact the performance of the path without exceptions (I ran the ArrayWriterBenchmark to confirm this). I also do not need to make this change in the fast path as the fast path is only invoked if the output type is primitive and fixed width, and in this there is no state other than the value in the Vector so failing to commit does not cause issues. This combined with facebookincubator#10376 addresses the issue identified in facebookincubator#10162 Reviewed By: weijiadeng-uber Differential Revision: D59473869
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D59473869 |
kgpai
approved these changes
Jul 9, 2024
Contributor
|
This pull request has been merged in 77589a9. |
|
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
The way the VectorWriters work today, if its writing a variable length type and it is not committed
(e.g. because an exception was thrown) when the next value is written it will start with the state of
the previous value rather than a clean slate. This can result in e.g. strings starting with the contents
that were written for the previous string.
SimpleFunctionAdapter tried to compensate for this by making a local copy of the top level
VectorWriter and only copying back into the original if processing the current row succeeds. This
does nothing for nested writers (it also wasn't implemented for Strings).
To fix this, I've added an optional lambda to applyToSelectedNoThrow that gets invoked when an
exception is caught. We can use this to call commitNull on the writer which should reset the state
of all writers (top level and nested). Note that if we're catching exceptions and not throwing anything
we must be in a try so committing null is safe and reasonable to do.
This shouldn't impact the performance of the path without exceptions (I ran the
ArrayWriterBenchmark to confirm this). I also do not need to make this change in the fast path as
the fast path is only invoked if the output type is primitive and fixed width, and in this there is no
state other than the value in the Vector so failing to commit does not cause issues.
This combined with #10376 addresses the issue
identified in #10162
Differential Revision: D59473869