-
Notifications
You must be signed in to change notification settings - Fork 1.4k
fix: Fix the MergeSource data lost issue #11772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for meta-velox canceled.
|
|
@pedroerp Can you help to review this PR? Thanks. |
✅ Deploy Preview for meta-velox canceled.
|
| struct State { | ||
| bool atEnd = false; | ||
| RowVectorPtr data; | ||
| std::queue<RowVectorPtr> dataQueue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in which case, we have more than one output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When executing a 1GB TPC-H Query 21 using multiple executors and multiple cores, the issue will arise.
|
Closing this PR because the current code does not have any data loss issues |
When running the TPCH Q21 query, we found that performing a left semi join followed by a left anti join in the same stage resulted in incorrect results. Upon investigation, we discovered that MergeSource was losing data in such complex join scenarios. This PR addresses the issue by placing the data from MergeSource into a queue, ensuring that the next method call only ends when the queue is empty.