Skip to content

HADOOP-19193. Create orphan commit for website deployment#6864

Merged
steveloughran merged 1 commit intoapache:trunkfrom
pan3793:HADOOP-19193
Jun 5, 2024
Merged

HADOOP-19193. Create orphan commit for website deployment#6864
steveloughran merged 1 commit intoapache:trunkfrom
pan3793:HADOOP-19193

Conversation

@pan3793
Copy link
Copy Markdown
Member

@pan3793 pan3793 commented Jun 5, 2024

Description of PR

Currently, the gh-pages deployment always creates new commits based on the previous one, which causes the git repo size to grow fast.

According to https://github.com/peaceiris/actions-gh-pages?tab=readme-ov-file#%EF%B8%8F-force-orphan-force_orphan, we can use force_orphan: true to create an orphan commit which is sufficient for website deployment cases, as it is some kinds of compilation outputs, we don't need to keep all history commits.

How was this patch tested?

Review.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@pan3793
Copy link
Copy Markdown
Member Author

pan3793 commented Jun 5, 2024

cc @steveloughran @ayushtkn

@hadoop-yetus
Copy link
Copy Markdown

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 17m 33s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 yamllint 0m 0s yamllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ trunk Compile Tests _
+1 💚 shadedclient 41m 45s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 shadedclient 32m 34s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 asflicense 0m 40s The patch does not generate ASF License warnings.
97m 47s
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6864/1/artifact/out/Dockerfile
GITHUB PR #6864
Optional Tests dupname asflicense codespell detsecrets yamllint
uname Linux 9830ca9f9a0a 5.15.0-106-generic #116-Ubuntu SMP Wed Apr 17 09:17:56 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 9bdde6b
Max. process+thread count 552 (vs. ulimit of 5500)
modules C: . U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6864/1/console
versions git=2.25.1 maven=3.6.3
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran steveloughran merged commit d8d3d53 into apache:trunk Jun 5, 2024
@steveloughran
Copy link
Copy Markdown
Contributor

merged. do we need to do it for other branches?

@pan3793
Copy link
Copy Markdown
Member Author

pan3793 commented Jun 5, 2024

The GitHub Actions workflow is only triggered by pushing commits to the trunk, no backport is required.

@pan3793
Copy link
Copy Markdown
Member Author

pan3793 commented Jun 5, 2024

@steveloughran I tested the trunk sync and fresh repo clone, the downloaded size is much more reasonable now.

$ git pull
remote: Enumerating objects: 5627, done.
remote: Counting objects: 100% (5623/5623), done.
remote: Compressing objects: 100% (975/975), done.
remote: Total 5627 (delta 4630), reused 5563 (delta 4576), pack-reused 4
Receiving objects: 100% (5627/5627), 7.82 MiB | 2.73 MiB/s, done.
Resolving deltas: 100% (4630/4630), completed with 3 local objects.
From github.com:apache/hadoop
   f92a8ab8ae54..2ee0bf953492  trunk      -> apache/trunk
 + 6a5fbc022450...75704ed4e4e7 gh-pages   -> apache/gh-pages  (forced update)
Updating f92a8ab8ae54..2ee0bf953492
Fast-forward
 .github/workflows/website.yml                                                    | 1 +
 LICENSE-binary                                                                   | 6 +++---
 hadoop-cloud-storage-project/hadoop-cos/src/site/markdown/cloud-storage/index.md | 2 +-
 hadoop-project/pom.xml                                                           | 2 +-
 4 files changed, 6 insertions(+), 5 deletions(-)
$ git clone git@github.com:apache/hadoop.git
Cloning into 'hadoop'...
remote: Enumerating objects: 1602835, done.
remote: Counting objects: 100% (5880/5880), done.
remote: Compressing objects: 100% (1147/1147), done.
remote: Total 1602835 (delta 4703), reused 5705 (delta 4613), pack-reused 1596955
Receiving objects: 100% (1602835/1602835), 566.57 MiB | 4.18 MiB/s, done.
Resolving deltas: 100% (805225/805225), done.
Updating files: 100% (15505/15505), done.

@steveloughran
Copy link
Copy Markdown
Contributor

thanks. This is the key cause of the really big downloads, isn't it?

@steveloughran
Copy link
Copy Markdown
Contributor

(I still think we should cull old branches, FWIW)

@pan3793
Copy link
Copy Markdown
Member Author

pan3793 commented Jun 7, 2024

This is the key cause of the really big downloads, isn't it?

should be, although I haven't analyzed the git repo blob, just asserting based on experience.

I still think we should cull old branches

I see your post in the mailing list, well, deleting old release branches (<2.6) might be aggressive.

The contribution of release branches to the total volume of the git repo should be negligible, I would suggest reserving them but deleting branches like HADOOP-XXXX, YARN-XXXX.

Additionally, seems that Hadoop creates branches for each release, it does not clean as Spark - only cutting branches for minor versions, and creating release tags on those branches.

- branch-3.5 ------- v3.5.0-rc0 ------- v3.5.0-rc1(v3.5.0) ------- v3.5.1-rc0 --- ...
                     ^                  ^          ^               ^
                     tag                tag        tag             tag

@steveloughran
Copy link
Copy Markdown
Contributor

we do a branch for things like 3.4.0, 3.4.1 so that we can do release stuff there which isn't needed on the main branch (pom changes, references, diffs). Not sure what we could do different there

@pan3793
Copy link
Copy Markdown
Member Author

pan3793 commented Jun 7, 2024

Use Spark branch-3.4 as an example, the branch-3.4 should always be under the release-ready state, when a patch release is called, RM(release manager) just performs two commits:

  • Bump version from 3.4.0-SNAPSHOT to 3.4.0, create a tag on this commit v3.4.0-rc0, and "pom changes, references, diffs" should happen here, by scripts.
  • Bump version from 3.4.0 to 3.4.1-SNAPSHOT, similar with above.
image

If something goes wrong, the RC fails, then repeat the work until the release is out.
This branch model is adopted by many other Apache projects like Apache Kyuubi, and Apache Celeborn. For Kyuubi, we have a release guide https://kyuubi.readthedocs.io/en/master/contributing/code/release.html

@pan3793 pan3793 deleted the HADOOP-19193 branch April 22, 2026 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants