HADOOP-19193. Create orphan commit for website deployment by pan3793 · Pull Request #6864 · apache/hadoop

pan3793 · 2024-06-05T09:38:39Z

Description of PR

Currently, the gh-pages deployment always creates new commits based on the previous one, which causes the git repo size to grow fast.

According to https://github.com/peaceiris/actions-gh-pages?tab=readme-ov-file#%EF%B8%8F-force-orphan-force_orphan, we can use force_orphan: true to create an orphan commit which is sufficient for website deployment cases, as it is some kinds of compilation outputs, we don't need to keep all history commits.

How was this patch tested?

Review.

For code changes:

Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

pan3793 · 2024-06-05T09:40:20Z

cc @steveloughran @ayushtkn

hadoop-yetus · 2024-06-05T11:17:44Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	17m 33s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	yamllint	0m 0s		yamllint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
			_ trunk Compile Tests _
+1 💚	shadedclient	41m 45s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	shadedclient	32m 34s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	asflicense	0m 40s		The patch does not generate ASF License warnings.
		97m 47s

Subsystem	Report/Notes
Docker	ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6864/1/artifact/out/Dockerfile
GITHUB PR	#6864
Optional Tests	dupname asflicense codespell detsecrets yamllint
uname	Linux 9830ca9f9a0a 5.15.0-106-generic #116-Ubuntu SMP Wed Apr 17 09:17:56 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `9bdde6b`
Max. process+thread count	552 (vs. ulimit of 5500)
modules	C: . U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6864/1/console
versions	git=2.25.1 maven=3.6.3
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

steveloughran · 2024-06-05T14:26:29Z

merged. do we need to do it for other branches?

pan3793 · 2024-06-05T17:32:53Z

The GitHub Actions workflow is only triggered by pushing commits to the trunk, no backport is required.

pan3793 · 2024-06-05T17:46:05Z

@steveloughran I tested the trunk sync and fresh repo clone, the downloaded size is much more reasonable now.

$ git pull
remote: Enumerating objects: 5627, done.
remote: Counting objects: 100% (5623/5623), done.
remote: Compressing objects: 100% (975/975), done.
remote: Total 5627 (delta 4630), reused 5563 (delta 4576), pack-reused 4
Receiving objects: 100% (5627/5627), 7.82 MiB | 2.73 MiB/s, done.
Resolving deltas: 100% (4630/4630), completed with 3 local objects.
From github.com:apache/hadoop
   f92a8ab8ae54..2ee0bf953492  trunk      -> apache/trunk
 + 6a5fbc022450...75704ed4e4e7 gh-pages   -> apache/gh-pages  (forced update)
Updating f92a8ab8ae54..2ee0bf953492
Fast-forward
 .github/workflows/website.yml                                                    | 1 +
 LICENSE-binary                                                                   | 6 +++---
 hadoop-cloud-storage-project/hadoop-cos/src/site/markdown/cloud-storage/index.md | 2 +-
 hadoop-project/pom.xml                                                           | 2 +-
 4 files changed, 6 insertions(+), 5 deletions(-)

$ git clone git@github.com:apache/hadoop.git
Cloning into 'hadoop'...
remote: Enumerating objects: 1602835, done.
remote: Counting objects: 100% (5880/5880), done.
remote: Compressing objects: 100% (1147/1147), done.
remote: Total 1602835 (delta 4703), reused 5705 (delta 4613), pack-reused 1596955
Receiving objects: 100% (1602835/1602835), 566.57 MiB | 4.18 MiB/s, done.
Resolving deltas: 100% (805225/805225), done.
Updating files: 100% (15505/15505), done.

steveloughran · 2024-06-06T19:44:42Z

thanks. This is the key cause of the really big downloads, isn't it?

steveloughran · 2024-06-06T19:44:58Z

(I still think we should cull old branches, FWIW)

pan3793 · 2024-06-07T02:23:23Z

This is the key cause of the really big downloads, isn't it?

should be, although I haven't analyzed the git repo blob, just asserting based on experience.

I still think we should cull old branches

I see your post in the mailing list, well, deleting old release branches (<2.6) might be aggressive.

The contribution of release branches to the total volume of the git repo should be negligible, I would suggest reserving them but deleting branches like HADOOP-XXXX, YARN-XXXX.

Additionally, seems that Hadoop creates branches for each release, it does not clean as Spark - only cutting branches for minor versions, and creating release tags on those branches.

- branch-3.5 ------- v3.5.0-rc0 ------- v3.5.0-rc1(v3.5.0) ------- v3.5.1-rc0 --- ...
                     ^                  ^          ^               ^
                     tag                tag        tag             tag

steveloughran · 2024-06-07T13:36:43Z

we do a branch for things like 3.4.0, 3.4.1 so that we can do release stuff there which isn't needed on the main branch (pom changes, references, diffs). Not sure what we could do different there

pan3793 · 2024-06-07T13:56:41Z

Use Spark branch-3.4 as an example, the branch-3.4 should always be under the release-ready state, when a patch release is called, RM(release manager) just performs two commits:

Bump version from 3.4.0-SNAPSHOT to 3.4.0, create a tag on this commit v3.4.0-rc0, and "pom changes, references, diffs" should happen here, by scripts.
Bump version from 3.4.0 to 3.4.1-SNAPSHOT, similar with above.

If something goes wrong, the RC fails, then repeat the work until the release is out.
This branch model is adopted by many other Apache projects like Apache Kyuubi, and Apache Celeborn. For Kyuubi, we have a release guide https://kyuubi.readthedocs.io/en/master/contributing/code/release.html

HADOOP-19193. Create orphan commit for website deployment

9bdde6b

github-actions Bot added trunk Infra labels Jun 5, 2024

ayushtkn approved these changes Jun 5, 2024

View reviewed changes

steveloughran merged commit d8d3d53 into apache:trunk Jun 5, 2024

pan3793 deleted the HADOOP-19193 branch April 22, 2026 05:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HADOOP-19193. Create orphan commit for website deployment#6864

HADOOP-19193. Create orphan commit for website deployment#6864
steveloughran merged 1 commit intoapache:trunkfrom
pan3793:HADOOP-19193

pan3793 commented Jun 5, 2024 •

edited

Loading

Uh oh!

pan3793 commented Jun 5, 2024

Uh oh!

hadoop-yetus commented Jun 5, 2024

Uh oh!

steveloughran commented Jun 5, 2024

Uh oh!

pan3793 commented Jun 5, 2024

Uh oh!

pan3793 commented Jun 5, 2024 •

edited

Loading

Uh oh!

steveloughran commented Jun 6, 2024

Uh oh!

steveloughran commented Jun 6, 2024

Uh oh!

pan3793 commented Jun 7, 2024

Uh oh!

steveloughran commented Jun 7, 2024

Uh oh!

pan3793 commented Jun 7, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

pan3793 commented Jun 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of PR

How was this patch tested?

For code changes:

Uh oh!

pan3793 commented Jun 5, 2024

Uh oh!

hadoop-yetus commented Jun 5, 2024

Uh oh!

steveloughran commented Jun 5, 2024

Uh oh!

pan3793 commented Jun 5, 2024

Uh oh!

pan3793 commented Jun 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steveloughran commented Jun 6, 2024

Uh oh!

steveloughran commented Jun 6, 2024

Uh oh!

pan3793 commented Jun 7, 2024

Uh oh!

steveloughran commented Jun 7, 2024

Uh oh!

pan3793 commented Jun 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pan3793 commented Jun 5, 2024 •

edited

Loading

pan3793 commented Jun 5, 2024 •

edited

Loading

pan3793 commented Jun 7, 2024 •

edited

Loading