Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions .github/workflows/build_and_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,10 +56,14 @@ jobs:
GITHUB_PREV_SHA: ${{ github.event.before }}
outputs:
required: ${{ steps.set-outputs.outputs.required }}
# For branch3.2, branch3.3, still use previous image tag.
# If infra image has changes, use dynamic infra image URL.
# If infra image hasn't changed, use pre-build image directly.
image_url: >-
${{
(inputs.branch == 'master' && steps.infra-image-outputs.outputs.image_url)
|| 'dongjoon/apache-spark-github-action-image:20220207'
(inputs.branch == 'branch-3.2' || inputs.branch == 'branch-3.3') && 'dongjoon/apache-spark-github-action-image:20220207'
|| (fromJson(steps.set-outputs.outputs.required).infra-image == 'true' && steps.infra-image-outputs.outputs.image_url)
|| format('ghcr.io/apache/spark/apache-spark-github-action-image-cache:{0}-static', inputs.branch)
}}
steps:
- name: Checkout Spark repository
Expand Down Expand Up @@ -87,6 +91,7 @@ jobs:
sparkr=`./dev/is-changed.py -m sparkr`
tpcds=`./dev/is-changed.py -m sql`
docker=`./dev/is-changed.py -m docker-integration-tests`
infra_image=`./dev/is-changed.py -m infra-image`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm .. actually quick question. would that still build the base image when there are other changes together with Dockerfile? Do we return true when root module is detected?

Copy link
Member Author

@Yikun Yikun Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would that still build the base image when there are other changes together with Dockerfile?

Yes, module_names contains all changed module's name, test_modules will be ['infra-image'], this will return true if infra-image module is detected:

    elif len(set(test_modules).intersection(module_names)) == 0:
        print("false")
        if opts.fail:
            sys.exit(1)
    else:
        print("true")

BTW, If the fork repository is not synchronized to the latest, if the dockerfile changes are included in the synchronized code, it will also return true. You could see test Yikun#170 , infra-image return true, then build and use the base image in lint, sparkr and pyspark. In this case, if GHCR is unstable, only a rebase needed in fork repo.

Do we return true when root module is detected?

No, we only return true when infra dockerfile changes.

If only root module is detected, will first skip this root check

    # `./dev/is-changed.py -m infra-image` == True only when changing the infra dockerfile
    elif ("root" in test_modules or modules.root in changed_modules) and (
        ["infra-image"] != test_modules
    ):

and then do continue check as above (elif and else) return result according infra-image module is detected. The current PR is an real example to show this case, root dectected (because we change the yaml), but didn't change the dockerfile, infra-image return false, then use the master-static image.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could merge this in the morning of some day (such as tmr), I'll keep watching.

fi
# 'build', 'scala-213', and 'java-11-17' are always true for now.
# It does not save significant time and most of PRs trigger the build.
Expand All @@ -97,6 +102,7 @@ jobs:
\"sparkr\": \"$sparkr\",
\"tpcds-1g\": \"$tpcds\",
\"docker-integration-tests\": \"$docker\",
\"infra-image\": \"$infra_image\",
\"scala-213\": \"true\",
\"java-11-17\": \"true\",
\"lint\" : \"true\",
Expand All @@ -116,6 +122,7 @@ jobs:
fi
- name: Generate infra image URL
id: infra-image-outputs
if: fromJson(steps.set-outputs.outputs.required).infra-image == 'true'
run: |
# Convert to lowercase to meet Docker repo name requirement
REPO_OWNER=$(echo "${{ github.repository_owner }}" | tr '[:upper:]' '[:lower:]')
Expand Down Expand Up @@ -270,6 +277,7 @@ jobs:
needs: precondition
# Currently, only enable docker build from cache for `master` branch jobs
if: >-
fromJson(needs.precondition.outputs.required).infra-image == 'true' &&
(fromJson(needs.precondition.outputs.required).pyspark == 'true' ||
fromJson(needs.precondition.outputs.required).lint == 'true' ||
fromJson(needs.precondition.outputs.required).sparkr == 'true') &&
Expand Down
5 changes: 4 additions & 1 deletion dev/is-changed.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,10 @@ def main():
print("false")
if opts.fail:
sys.exit(1)
elif "root" in test_modules or modules.root in changed_modules:
# `./dev/is-changed.py -m infra-image` == True only when changing the infra dockerfile
elif ("root" in test_modules or modules.root in changed_modules) and (
["infra-image"] != test_modules
):
print("true")
elif len(set(test_modules).intersection(module_names)) == 0:
print("false")
Expand Down
8 changes: 8 additions & 0 deletions dev/sparktestsupport/modules.py
Original file line number Diff line number Diff line change
Expand Up @@ -781,6 +781,14 @@ def __hash__(self):
test_tags=["org.apache.spark.tags.DockerTest"],
)

infra_image = Module(
name="infra-image",
dependencies=[],
source_file_regexes=[
"dev/infra/",
],
)

# The root module is a dummy module which is used to run all of the tests.
# No other modules should directly depend on this module.
root = Module(
Expand Down