StringWriter needs to implement finalizeNull#10376
Closed
kevinwilfong wants to merge 1 commit intofacebookincubator:mainfrom
Closed
StringWriter needs to implement finalizeNull#10376kevinwilfong wants to merge 1 commit intofacebookincubator:mainfrom
kevinwilfong wants to merge 1 commit intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D59291362 |
mbasmanova
approved these changes
Jul 2, 2024
Contributor
mbasmanova
left a comment
There was a problem hiding this comment.
Thank you! Appreciate detailed explanation.
Contributor
There was a problem hiding this comment.
nit: you can use std::string instead of StringView; this makes code a bit easier to read
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D59291362 |
b45c60a to
ae0d011
Compare
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D59291362 |
ae0d011 to
074b34c
Compare
Summary: Pull Request resolved: facebookincubator#10376 When StringWriter commits it calls proxy_.prepareForReuse to update the state of the proxy, either advancing it if a value is being written or resetting it if a null is being written. StringWriter needs to implement finalizeNull to do the same in case its parent commits a null. When a writer for a complex type commits a null, it invokes finalizeNull on its children to reset their state. StringWriter does not implement this today which means, e.g. if it is the writer for the elements of an Array, if the code has written an uncommitted String to the Array when the Array is committed as null, the next Array that gets written will end up with its StringWriter still holding the state from that last string in the previous Array, meaning its contents could appear at the beginning of the first string in that Array. We were also not calling proxy_.prepareForReuse in commitNull (only commit(false)) which resulted in a similar issue, this change also fixes that. This partially addresses the issue identified in facebookincubator#10162 Reviewed By: mbasmanova Differential Revision: D59291362
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D59291362 |
074b34c to
07e7798
Compare
kevinwilfong
added a commit
to kevinwilfong/velox
that referenced
this pull request
Jul 2, 2024
…ator#10377) Summary: Pull Request resolved: facebookincubator#10377 The way the VectorWriters work today, if its writing a variable length type and it is not committed (e.g. because an exception was thrown) when the next value is written it will start with the state of the previous value rather than a clean slate. This can result in e.g. strings starting with the contents that were written for the previous string. SimpleFunctionAdapter tried to compensate for this by making a local copy of the top level VectorWriter and only copying back into the original if processing the current row succeeds. This does nothing for nested writers (it also wasn't implemented for Strings). To fix this, I've added an optional lambda to applyToSelectedNoThrow that gets invoked when an exception is caught. We can use this to call commitNull on the writer which should reset the state of all writers (top level and nested). Note that if we're catching exceptions and not throwing anything we must be in a try so committing null is safe and reasonable to do. This shouldn't impact the performance of the path without exceptions (I ran the ArrayWriterBenchmark to confirm this). I also do not need to make this change in the fast path as the fast path is only invoked if the output type is primitive and fixed width, and in this there is no state other than the value in the Vector so failing to commit does not cause issues. This combined with facebookincubator#10376 addresses the issue identified in facebookincubator#10162 Reviewed By: mbasmanova Differential Revision: D59292285
facebook-github-bot
pushed a commit
that referenced
this pull request
Jul 3, 2024
Summary: Pull Request resolved: #10377 The way the VectorWriters work today, if its writing a variable length type and it is not committed (e.g. because an exception was thrown) when the next value is written it will start with the state of the previous value rather than a clean slate. This can result in e.g. strings starting with the contents that were written for the previous string. SimpleFunctionAdapter tried to compensate for this by making a local copy of the top level VectorWriter and only copying back into the original if processing the current row succeeds. This does nothing for nested writers (it also wasn't implemented for Strings). To fix this, I've added an optional lambda to applyToSelectedNoThrow that gets invoked when an exception is caught. We can use this to call commitNull on the writer which should reset the state of all writers (top level and nested). Note that if we're catching exceptions and not throwing anything we must be in a try so committing null is safe and reasonable to do. This shouldn't impact the performance of the path without exceptions (I ran the ArrayWriterBenchmark to confirm this). I also do not need to make this change in the fast path as the fast path is only invoked if the output type is primitive and fixed width, and in this there is no state other than the value in the Vector so failing to commit does not cause issues. This combined with #10376 addresses the issue identified in #10162 Reviewed By: mbasmanova Differential Revision: D59292285 fbshipit-source-id: 00c000626bb12451b7c95c24460eb62816532403
Contributor
|
This pull request has been merged in a033968. |
kevinwilfong
added a commit
to kevinwilfong/velox
that referenced
this pull request
Jul 9, 2024
…bookincubator#10418) Summary: Pull Request resolved: facebookincubator#10418 The way the VectorWriters work today, if its writing a variable length type and it is not committed (e.g. because an exception was thrown) when the next value is written it will start with the state of the previous value rather than a clean slate. This can result in e.g. strings starting with the contents that were written for the previous string. SimpleFunctionAdapter tried to compensate for this by making a local copy of the top level VectorWriter and only copying back into the original if processing the current row succeeds. This does nothing for nested writers (it also wasn't implemented for Strings). To fix this, I've added an optional lambda to applyToSelectedNoThrow that gets invoked when an exception is caught. We can use this to call commitNull on the writer which should reset the state of all writers (top level and nested). Note that if we're catching exceptions and not throwing anything we must be in a try so committing null is safe and reasonable to do. This shouldn't impact the performance of the path without exceptions (I ran the ArrayWriterBenchmark to confirm this). I also do not need to make this change in the fast path as the fast path is only invoked if the output type is primitive and fixed width, and in this there is no state other than the value in the Vector so failing to commit does not cause issues. This combined with facebookincubator#10376 addresses the issue identified in facebookincubator#10162 Reviewed By: weijiadeng-uber Differential Revision: D59473869
facebook-github-bot
pushed a commit
that referenced
this pull request
Jul 9, 2024
Summary: Pull Request resolved: #10418 The way the VectorWriters work today, if its writing a variable length type and it is not committed (e.g. because an exception was thrown) when the next value is written it will start with the state of the previous value rather than a clean slate. This can result in e.g. strings starting with the contents that were written for the previous string. SimpleFunctionAdapter tried to compensate for this by making a local copy of the top level VectorWriter and only copying back into the original if processing the current row succeeds. This does nothing for nested writers (it also wasn't implemented for Strings). To fix this, I've added an optional lambda to applyToSelectedNoThrow that gets invoked when an exception is caught. We can use this to call commitNull on the writer which should reset the state of all writers (top level and nested). Note that if we're catching exceptions and not throwing anything we must be in a try so committing null is safe and reasonable to do. This shouldn't impact the performance of the path without exceptions (I ran the ArrayWriterBenchmark to confirm this). I also do not need to make this change in the fast path as the fast path is only invoked if the output type is primitive and fixed width, and in this there is no state other than the value in the Vector so failing to commit does not cause issues. This combined with #10376 addresses the issue identified in #10162 Reviewed By: kgpai, weijiadeng-uber Differential Revision: D59473869 fbshipit-source-id: 13190428688afc266e5b0c5d8129e424e2a9b866
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
When StringWriter commits it calls proxy_.prepareForReuse to update the state of the proxy, either
advancing it if a value is being written or resetting it if a null is being written.
StringWriter needs to implement finalizeNull to do the same in case its parent commits a null. When
a writer for a complex type commits a null, it invokes finalizeNull on its children to reset their state.
StringWriter does not implement this today which means, e.g. if it is the writer for the elements of
an Array, if the code has written an uncommitted String to the Array when the Array is committed as
null, the next Array that gets written will end up with its StringWriter still holding the state from that
last string in the previous Array, meaning its contents could appear at the beginning of the first string
in that Array.
We were also not calling proxy_.prepareForReuse in commitNull (only commit(false)) which resulted
in a similar issue, this change also fixes that.
This partially addresses the issue identified in #10162
Differential Revision: D59291362