Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci(markdownlint): use GitHub API to get changed files #10390

Merged
merged 4 commits into from
Dec 6, 2022

Conversation

yin1999
Copy link
Member

@yin1999 yin1999 commented Dec 2, 2022

Description

fix(ci): use GitHub API to get changed files

Motivation

The tj-actions/changed-files@v34 action in Markdownlint (PR files) may failed to get changed files. And as a test for #8952.

I've made a test before to get changed via GitHub API. It works, but I'm not sure whether it's always ok. I just find out a limitation in gh cli as here before, so I've tried another way to get changed files (by commit compare REST API).

And for the API I used for now, it could list 30 commits per REST API request, so the files limitation maybe resolved, see below:

PR info: https://api.github.com/repos/mdn/translated-content/pulls/9513
Compare info (all the 183 files in in the single rest response, see field files): https://api.github.com/repos/mdn/translated-content/compare/a9df62bb4885e5d9cdccd3834d5a24c921ef7cfa...16f5fc0db4ab1f5cb3858ab7878c428c19e9a79e

Is there any other things I've ignored? @nschonni @OnkarRuikar

Related issues and pull requests

more information can be found in those PR:

@github-actions github-actions bot added the system Infrastructure and configuration for the project label Dec 2, 2022
@OnkarRuikar
Copy link
Contributor

This will be efficient cos unlike the third party action this uses GitHub APIs instead of git(requires to fetch a lot of history).
This also reduces dependency on third party action(which has to handle a lot of use cases, our requirement is simple).


@yin1999 can we use the List pull requests files API instead of the diff API? It will make the logic simple and reduce a few lines of code.

gh api /repos/OWNER/REPO/pulls/PULL_NUMBER/files

For testing you'll have to do the same change in yin1999#8 . After that wait till there are some commits to the main branch then manually rerun the worflows(without rebasing).
Better if you create a new test PR for this in your fork.


Nitpick, we can work without using /tmp/changed-files.txt:

DIFF_DOCUMENTS=$(gh api ...)
DIFF_DOCUMENTS=$(echo ${DIFF_DOCUMENTS} | egrep -i "^files/.*\.md$" | xargs)
...

@yin1999
Copy link
Member Author

yin1999 commented Dec 2, 2022

This will be efficient cos unlike the third party action this uses GitHub APIs instead of git(requires to fetch a lot of history). This also reduces dependency on third party action(which has to handle a lot of use cases, our requirement is simple).

@yin1999 can we use the List pull requests files API instead of the diff API? It will make the logic simple and reduce a few lines of code.

gh api /repos/OWNER/REPO/pulls/PULL_NUMBER/files

Hi @OnkarRuikar. I've tried this before. But there is a limitation of 100 files per_page. So for some large PR. We may failed to fetch all the changed files (use gh api with --paginate). (You can try it locally, for this PR #9513, sometimes we can only fetch 100 changed files, I just could not figure out the reason.)

And if we use compare api, we can make sure to only list out changed file for the single workflow (that means, we may push a new commit, so the files listed by pulls api may be different with the corresponding workflow run (head and base sha may changed))

Nitpick, we can work without using /tmp/changed-files.txt:

DIFF_DOCUMENTS=$(gh api ...)
DIFF_DOCUMENTS=$(echo ${DIFF_DOCUMENTS} | egrep -i "^files/.*\.md$" | xargs)
...

Thanks for the suggestion, I'll apply this.

Co-authored-by: Onkar Ruikar <[email protected]>
@yin1999 yin1999 marked this pull request as ready for review December 2, 2022 14:12
@yin1999 yin1999 requested a review from a team as a code owner December 2, 2022 14:12
@yin1999
Copy link
Member Author

yin1999 commented Dec 2, 2022

also /cc @SphinxKnight for any suggestions :)

@nschonni
Copy link
Contributor

nschonni commented Dec 2, 2022

Cool, if you've found a better solution, it might be worth upstreaming to that tj action, or create it as an action in mdn/workflows later for easier maintenance

@yin1999
Copy link
Member Author

yin1999 commented Dec 3, 2022

Cool, if you've found a better solution, it might be worth upstreaming to that tj action, or create it as an action in mdn/workflows later for easier maintenance

Thanks for suggestions, tj action metioned No extra API calls., so this maybe not a good idea to upstreaming this. And I'm not sure if this always works well, so this might be a good idea to have a test on this workflow first, then we could create it as an action in mdn/workflows later :)

@OnkarRuikar
Copy link
Contributor

@yin1999

But there is a limitation of 100 files per_page. So for some large PR. We may failed to fetch all the changed files (use gh api with --paginate).

The Compare two commits API also has the same limitations. The --pageinate option works perfectly with both the APIs.

You can try it locally, for this PR #9513, sometimes we can only fetch 100 changed files, I just could not figure out the reason.

It works in my environment:

gh api /repos/mdn/translated-content/pulls/9513/files --paginate \
  --jq '.[] | select(.status|IN("added", "modified", "renamed", "copied", "changed")) |.filename' | wc -l

93

I tried it multiple times and it gave me 93 count every time which is correct: 93(added) + 90(deleted) = 183. Note, we are excluding deleted files in the query.
So we can use the List pull request files API without any issues.

And if we use compare api, we can make sure to only list out changed file for the single workflow (that means, we may push a new commit, so the files listed by pulls api may be different with the corresponding workflow run (head and base sha may changed))

Head always points to the latest commit and the base remains the same every time. So with the current compare API code we always get files modified from first commit to the latest commit. So how is it different from the list PR files api?


@nschonni

Cool, if you've found a better solution, it might be worth upstreaming to that tj action, or create it as an action in mdn/workflows later for easier maintenance

There is already another third party action available that uses the same strategy (of using the Github compare commits API): https://github.com/jitterbit/get-changed-files/blob/d06c756e3609dd3dd5d302dde8d1339af3f790f2/src/main.ts#L62-L67
We could simply replace tj-action with that one or have our own in-house solution(mdn/workflow) using this PR's code.

Third party workflows provide a lot of features that we don't use. I am in favor of having our own small, simple, customized solution to avoid any dependency.

@yin1999
Copy link
Member Author

yin1999 commented Dec 3, 2022

I tried it multiple times and it gave me 93 count every time which is correct: 93(added) + 90(deleted) = 183. Note, we are excluding deleted files in the query.

I'm not really sure, I tried it some time ago.

So how is it different from the list PR files api?

image

But we can make sure that it's ok for the final workflow run.

@OnkarRuikar, we can try to move this part as the first step, to avoid some of the special cases. And yes, I prefer List pull requests files API, for we only need file changes info, if we would not be affected by those special cases.

@OnkarRuikar
Copy link
Contributor

@caugner caugner merged commit ac8dc2c into mdn:main Dec 6, 2022
@caugner caugner changed the title fix(ci): use GitHub API to get changed files ci(markdownlint): use GitHub API to get changed files Dec 6, 2022
@yin1999 yin1999 deleted the fix-get-changed-files branch December 6, 2022 14:05
@yin1999
Copy link
Member Author

yin1999 commented Dec 6, 2022

Thanks @caugner, @OnkarRuikar and @nschonni. Let's see would it works fine. So we can go ahead for mdn/content and another workflow :)

yin1999 added a commit to yin1999/translated-content that referenced this pull request Dec 6, 2022
@OnkarRuikar
Copy link
Contributor

OnkarRuikar commented Dec 6, 2022

@yin1999 to test this quickly, rerun workflows on an old PR without rebasing it.

@yin1999
Copy link
Member Author

yin1999 commented Dec 6, 2022

@yin1999 to test this quickly, rerun workflows on an old PR without rebasing it.

Yes, we have some workflow runs here: https://github.com/mdn/translated-content/actions/workflows/pr-check_markdownlint.yml (with a merge commit in this PR: #10508)

@OnkarRuikar
Copy link
Contributor

OnkarRuikar commented Dec 6, 2022

No, we want base branch few commits ahead of the feature branch. That means we need to let the branch sit idle till main has more commits. After that without updating the branch by any means do Re-run jobs in Actions tab:
guidepages

I am sure this won't fail but because of this test case we got rid of the tj-action.

@yin1999
Copy link
Member Author

yin1999 commented Dec 6, 2022

No, we want base branch few commits ahead of the feature branch. That means we need to let the branch sit idle till main has more commits. After that without updating the branch by any means do Re-run jobs in Actions tab: guidepages

I am sure this won't fail but because of this test case we got rid of the tj-action.

edit: seems ok for this workflow run: https://github.com/mdn/translated-content/actions/runs/3630538397/jobs/6125489914

@OnkarRuikar
Copy link
Contributor

There is a huge performance improvement.
Execution time went down from ~150sec to ~40sec!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system Infrastructure and configuration for the project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants