-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-12421][SQL] Fixed copy() method of GenericRow. #10374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GenericRowis supposed to be immutable. So there we really shouldn't need to copy its values. Could you explain why this is needed?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is immutable, but if you want to extract the values of a GenericRows, change them and create a new row, then it's better to not change the original values.
So you would first want to receive a copy of the original row.
Also the method name copy() implies that it returns a real copy and not the reference on the original object.
Other rows like GenericInternalRow implement a correct copy method:
override def copy(): InternalRow = new GenericInternalRow(values.clone())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You cannot a put value back in the
GenericRowwhen you have taken it out, unless it is a mutable object. Could you provide an example for this? Which is also a good basis for a unit test.copyis merely a contract all rows need to adhere to. If a row is inmutable, why copy it? It also avoids a lot object allocations. I thinkGenericInternalRowshould also return reference to it self oncopy.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try to provide a simple example and I could not try the code below right now. Imagine you want to query a row, copy it, change a value and then build a new one. Then you want to build a DF out of the new and the old row. I know that this is a strange example and there are a lot of better method to implement it, but you could implement it this way:
If you don't want that someone uses the copy method, because it does not do what it implies, then i think we should provide another trait/contract which does not include this method. It's just confusing if the copy method does not copy the object.
row.copy()and justroware identical so far!I don't see the point, why we should'nt provide this copy method for the users who want to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been doing some digging. Using an adapted version of your code (only made it work):
newDF.showyields the following result:Which is wrong. What happens is that the
firstRowCopied.toSeqwraps the value array in aArrayOpsobject, this object returns wrapped backing array (instead of a copy) when the you invoketoArray. This really shouldn't happen, because now you mutable gain access to a structure which is supposed to be immutable. I think we should change thetoSeqmethod instead.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what's the best solution. Maybe we need some help from more experienced committers here. I think we are agreed that we need a change here, but we differ how the change should look like.
Anyway, it's not urgent, because the wrong behave should only affect just a few developers/users.