Skip to content

Conversation

@huaxingao
Copy link
Contributor

@huaxingao huaxingao commented May 8, 2025

In order to keep all the git history on Spark4.0, we need to follow the proposal in this discussion.

I will reverts #12494 and redo the 4.0 integration PR. I will have three commits:

  • [Move 3.5 as 4.0]
  • [Copy back 4.0 as 3.5]
  • [initial support for Spark 4.0]

This PR reverted everything of #12494 except the Comet version bump. The 4.0 changes that were made in 809a232#diff-ba3f1e91e786949eb31ca8e396d7101ce78adc569350c0a1a0deb76ab9aa3074 were manually remove.

@huaxingao
Copy link
Contributor Author

@huaxingao
Copy link
Contributor Author

@pan3793 I think this over, actually, the key to preserve history is what happens in step 1, right? Ideally, I should have 3 commits, but as long as I have 1 and 2, it's OK to have more than 3 commits?

  1. [Move 3.5 as 4.0]
  2. [Copy back 4.0 as 3.5]
  3. [initial support for Spark 4.0]

@pan3793
Copy link
Member

pan3793 commented May 8, 2025

commit and revert may mess the commit history worse, do you have permission to force push(with dropping the previous commit) the main branch directly? (this might affect a few contributors) otherwise I would suggest just keeping it as-is.

@amogh-jahagirdar
Copy link
Contributor

amogh-jahagirdar commented May 8, 2025

I'd prefer to just keep things as is so we don't complicate history.

It's a miss on my part to not catch the breakdown of the move/copy/initial support and miss preserving history there but I think it's ultimately OK to have the single commit that we have now.

@huaxingao
Copy link
Contributor Author

Thanks @amogh-jahagirdar @pan3793 . I will close this PR for now

@huaxingao huaxingao closed this May 8, 2025
@wypoon
Copy link
Contributor

wypoon commented May 8, 2025

@pan3793 I think this over, actually, the key to preserve history is what happens in step 1, right? Ideally, I should have 3 commits, but as long as I have 1 and 2, it's OK to have more than 3 commits?

  1. [Move 3.5 as 4.0]
  2. [Copy back 4.0 as 3.5]
  3. [initial support for Spark 4.0]

I know this is abandoned, but for future reference, why do you want to have more than 3 commits? If you have 1 and 2 for sure, then to have more than 3 commits means you want to spread the initial support for the new version over several commits? What benefit would that bring? For a PR itself, the initial support may well occur over many commits due to iterations, but for merging into the repo, we'd want a single commit for 3.

@huaxingao
Copy link
Contributor Author

@wypoon right, we should always have 3 commits. I forgot to squash all the remaining commits into one and was trying to figure out whether the commit history was still preserved.

@huaxingao
Copy link
Contributor Author

I had an offline discussion with @amogh-jahagirdar. We have decided to revert the integration PR and redo it, so we can keep all the commit history.

@huaxingao huaxingao reopened this May 8, 2025
@wypoon
Copy link
Contributor

wypoon commented May 8, 2025

I had an offline discussion with @amogh-jahagirdar. We have decided to revert the integration PR and redo it, so we can keep all the commit history.

Thanks for the heads up!

@wypoon
Copy link
Contributor

wypoon commented May 8, 2025

Perhaps revert 809a232 first, then this, followed by redo of Spark 4.0 support and redo of 809a232.

@huaxingao
Copy link
Contributor Author

Thanks @wypoon for the suggestion! I will try to resolve the conflicts on my side and see how it goes

@huaxingao huaxingao force-pushed the revert-12494-spark4.0 branch from 21d1459 to 11e1873 Compare May 8, 2025 22:38
@huaxingao
Copy link
Contributor Author

@amogh-jahagirdar @pan3793 @wypoon @szehon-ho
All checks have passed. Could you please check the PR when you have a min?

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, and it looks like the reason this is not exactly the inverse of the original is because the revert also includes the 4.0 changes that were made in 809a232#diff-ba3f1e91e786949eb31ca8e396d7101ce78adc569350c0a1a0deb76ab9aa3074 right @huaxingao ? I'll leave it up for a bit before merging so that others can also take a look

@wypoon
Copy link
Contributor

wypoon commented May 9, 2025

I see that in #12494, there are 584 changed files, while in this PR, there are 580. Are the 4 files that are not reverted the build.gradle and CometColumnReader.java in spark/v3.4 and spark/v3.5? and that is because you just want to keep the Comet version at 0.8.1? (I checked gradle/libs.versions.toml in this PR and I see that the Comet version is not reverted.)
Aside from that, are the only files that are manually changed the 4 files in spark/v4.0 changed in 809a232?
Assuming the above, it looks good to me.

@huaxingao
Copy link
Contributor Author

@amogh-jahagirdar @wypoon

in #12494, there are 584 changed files, while in this PR, there are 580.

The following 4 files are changed in #12494, but not reverted.
Screenshot 2025-05-08 at 6 08 21 PM
We want to keep the Comet version changes.

Besides the above 4 files, the only files that are manually changed are the 4 files in spark/v4.0 changed in 809a232.

@manuzhang
Copy link
Member

@huaxingao Please update the PR description to summarize all the changes.

@huaxingao
Copy link
Contributor Author

@manuzhang

Please update the PR description to summarize all the changes.

Updated. Thanks!

@amogh-jahagirdar amogh-jahagirdar merged commit a5bcacd into main May 9, 2025
43 checks passed
@amogh-jahagirdar amogh-jahagirdar deleted the revert-12494-spark4.0 branch May 9, 2025 15:51
@huaxingao
Copy link
Contributor Author

Thank you all very much! @amogh-jahagirdar @wypoon @pan3793 @manuzhang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants