Skip to content

StringWriter needs to implement finalizeNull#10376

Closed
kevinwilfong wants to merge 1 commit intofacebookincubator:mainfrom
kevinwilfong:export-D59291362
Closed

StringWriter needs to implement finalizeNull#10376
kevinwilfong wants to merge 1 commit intofacebookincubator:mainfrom
kevinwilfong:export-D59291362

Conversation

@kevinwilfong
Copy link
Copy Markdown
Contributor

Summary:
When StringWriter commits it calls proxy_.prepareForReuse to update the state of the proxy, either
advancing it if a value is being written or resetting it if a null is being written.

StringWriter needs to implement finalizeNull to do the same in case its parent commits a null. When
a writer for a complex type commits a null, it invokes finalizeNull on its children to reset their state.
StringWriter does not implement this today which means, e.g. if it is the writer for the elements of
an Array, if the code has written an uncommitted String to the Array when the Array is committed as
null, the next Array that gets written will end up with its StringWriter still holding the state from that
last string in the previous Array, meaning its contents could appear at the beginning of the first string
in that Array.

We were also not calling proxy_.prepareForReuse in commitNull (only commit(false)) which resulted
in a similar issue, this change also fixes that.

This partially addresses the issue identified in #10162

Differential Revision: D59291362

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 2, 2024
@netlify
Copy link
Copy Markdown

netlify bot commented Jul 2, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 07e7798
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66847a37b99ae10008f42601

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D59291362

Copy link
Copy Markdown
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Appreciate detailed explanation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can use std::string instead of StringView; this makes code a bit easier to read

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D59291362

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D59291362

Summary:
Pull Request resolved: facebookincubator#10376

When StringWriter commits it calls proxy_.prepareForReuse to update the state of the proxy, either
advancing it if a value is being written or resetting it if a null is being written.

StringWriter needs to implement finalizeNull to do the same in case its parent commits a null.  When
a writer for a complex type commits a null, it invokes finalizeNull on its children to reset their state.
StringWriter does not implement this today which means, e.g. if it is the writer for the elements of
an Array, if the code has written an uncommitted String to the Array when the Array is committed as
null, the next Array that gets written will end up with its StringWriter still holding the state from that
last string in the previous Array, meaning its contents could appear at the beginning of the first string
in that Array.

We were also not calling proxy_.prepareForReuse in commitNull (only commit(false)) which resulted
in a similar issue, this change also fixes that.

This partially addresses the issue identified in facebookincubator#10162

Reviewed By: mbasmanova

Differential Revision: D59291362
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D59291362

kevinwilfong added a commit to kevinwilfong/velox that referenced this pull request Jul 2, 2024
…ator#10377)

Summary:
Pull Request resolved: facebookincubator#10377

The way the VectorWriters work today, if its writing a variable length type and it is not committed
(e.g. because an exception was thrown) when the next value is written it will start with the state of
the previous value rather than a clean slate.  This can result in e.g. strings starting with the contents
that were written for the previous string.

SimpleFunctionAdapter tried to compensate for this by making a local copy of the top level
VectorWriter and only copying back into the original if processing the current row succeeds.  This
does nothing for nested writers (it also wasn't implemented for Strings).

To fix this, I've added an optional lambda to applyToSelectedNoThrow that gets invoked when an
exception is caught.  We can use this to call commitNull on the writer which should reset the state
of all writers (top level and nested).  Note that if we're catching exceptions and not throwing anything
we must be in a try so committing null is safe and reasonable to do.

This shouldn't impact the performance of the path without exceptions (I ran the ArrayWriterBenchmark to confirm this).  I also do not need to make this change in the fast path as
the fast path is only invoked if the output type is primitive and fixed width, and in this there is no
state other than the value in the Vector so failing to commit does not cause issues.

This combined with facebookincubator#10376 addresses the issue
identified in facebookincubator#10162

Reviewed By: mbasmanova

Differential Revision: D59292285
facebook-github-bot pushed a commit that referenced this pull request Jul 3, 2024
Summary:
Pull Request resolved: #10377

The way the VectorWriters work today, if its writing a variable length type and it is not committed
(e.g. because an exception was thrown) when the next value is written it will start with the state of
the previous value rather than a clean slate.  This can result in e.g. strings starting with the contents
that were written for the previous string.

SimpleFunctionAdapter tried to compensate for this by making a local copy of the top level
VectorWriter and only copying back into the original if processing the current row succeeds.  This
does nothing for nested writers (it also wasn't implemented for Strings).

To fix this, I've added an optional lambda to applyToSelectedNoThrow that gets invoked when an
exception is caught.  We can use this to call commitNull on the writer which should reset the state
of all writers (top level and nested).  Note that if we're catching exceptions and not throwing anything
we must be in a try so committing null is safe and reasonable to do.

This shouldn't impact the performance of the path without exceptions (I ran the ArrayWriterBenchmark to confirm this).  I also do not need to make this change in the fast path as
the fast path is only invoked if the output type is primitive and fixed width, and in this there is no
state other than the value in the Vector so failing to commit does not cause issues.

This combined with #10376 addresses the issue
identified in #10162

Reviewed By: mbasmanova

Differential Revision: D59292285

fbshipit-source-id: 00c000626bb12451b7c95c24460eb62816532403
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request has been merged in a033968.

kevinwilfong added a commit to kevinwilfong/velox that referenced this pull request Jul 9, 2024
…bookincubator#10418)

Summary:
Pull Request resolved: facebookincubator#10418

The way the VectorWriters work today, if its writing a variable length type and it is not committed
(e.g. because an exception was thrown) when the next value is written it will start with the state of
the previous value rather than a clean slate.  This can result in e.g. strings starting with the contents
that were written for the previous string.

SimpleFunctionAdapter tried to compensate for this by making a local copy of the top level
VectorWriter and only copying back into the original if processing the current row succeeds.  This
does nothing for nested writers (it also wasn't implemented for Strings).

To fix this, I've added an optional lambda to applyToSelectedNoThrow that gets invoked when an
exception is caught.  We can use this to call commitNull on the writer which should reset the state
of all writers (top level and nested).  Note that if we're catching exceptions and not throwing anything
we must be in a try so committing null is safe and reasonable to do.

This shouldn't impact the performance of the path without exceptions (I ran the
ArrayWriterBenchmark to confirm this).  I also do not need to make this change in the fast path as
the fast path is only invoked if the output type is primitive and fixed width, and in this there is no
state other than the value in the Vector so failing to commit does not cause issues.

This combined with facebookincubator#10376 addresses the issue
identified in facebookincubator#10162

Reviewed By: weijiadeng-uber

Differential Revision: D59473869
facebook-github-bot pushed a commit that referenced this pull request Jul 9, 2024
Summary:
Pull Request resolved: #10418

The way the VectorWriters work today, if its writing a variable length type and it is not committed
(e.g. because an exception was thrown) when the next value is written it will start with the state of
the previous value rather than a clean slate.  This can result in e.g. strings starting with the contents
that were written for the previous string.

SimpleFunctionAdapter tried to compensate for this by making a local copy of the top level
VectorWriter and only copying back into the original if processing the current row succeeds.  This
does nothing for nested writers (it also wasn't implemented for Strings).

To fix this, I've added an optional lambda to applyToSelectedNoThrow that gets invoked when an
exception is caught.  We can use this to call commitNull on the writer which should reset the state
of all writers (top level and nested).  Note that if we're catching exceptions and not throwing anything
we must be in a try so committing null is safe and reasonable to do.

This shouldn't impact the performance of the path without exceptions (I ran the
ArrayWriterBenchmark to confirm this).  I also do not need to make this change in the fast path as
the fast path is only invoked if the output type is primitive and fixed width, and in this there is no
state other than the value in the Vector so failing to commit does not cause issues.

This combined with #10376 addresses the issue
identified in #10162

Reviewed By: kgpai, weijiadeng-uber

Differential Revision: D59473869

fbshipit-source-id: 13190428688afc266e5b0c5d8129e424e2a9b866
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants