-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32246][BUILD][INFRA] Enable streaming-kinesis-asl tests in Github Action CI
#43736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
.github/workflows/build_and_test.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
.github/workflows/build_and_test.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you touch k8s-integration-tests in kinesis PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
connector/kinesis-asl/pom.xml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need this for testing, this looks like an independent issue. Could you spin off from this GitHub Action PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test dependency is only needed for those tests that would incur billing cost, which are not enabled by default. I removed it from this PR.
pom.xml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can spin off this together with https://github.com/apache/spark/pull/43736/files#r1395282370
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @junyuc25 . This PR seems to have multiple themes in a single PR. I'd recommend to split them.
- A PR for test dependency change (jaxb-api)
- A PR for adding a new GitHub Action pipeline
.github/workflows/build_and_test.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, do we need to add a new pipeline? If this is a small test, we can append this to the existing pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dongjoon-hyun, what do you mean by add a new pipeline? Wonder if you could elaborate a bit more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are extremely careful about the number of jobs because there is a limit. It would be great if this PR can achieve the goal without changing the number of jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appended the tests to an existing pipeline.
…K17" This reverts commit 063386c4202f93e945c6decd0b6340349cd768a3.
93386ec to
fbb3f48
Compare
If those 22 test cases are ignored, can we safely change the kinesis module after this PR is merged? I still have some concerns about this. |
Hi @LuciferYang , just trying to double check, are you suggesting having a separate JIRA ticket to enable all tests by default? |
Yes |
|
Hi @dongjoon-hyun , I updated the PR as per your comments. Could you take a look when you get a chance? Thanks. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for update, @junyuc25 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks working. I manually verified KinesisCheckpointerSuite and other tests from the CI logs of this PR.
2023-11-20T06:29:03.8869208Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[32mKinesisCheckpointerSuite:�[0m�[0m
2023-11-20T06:29:03.8962864Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[32m- checkpoint is not called twice for the same sequence number (4 milliseconds)�[0m�[0m
2023-11-20T06:29:03.8994512Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[32m- checkpoint is called after sequence number increases (2 milliseconds)�[0m�[0m
2023-11-20T06:29:03.9037111Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[32m- should checkpoint if we have exceeded the checkpoint interval (3 milliseconds)�[0m�[0m
2023-11-20T06:29:03.9061448Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[32m- shouldn't checkpoint if we have not exceeded the checkpoint interval (1 millisecond)�[0m�[0m
2023-11-20T06:29:03.9094827Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[32m- should not checkpoint for the same sequence number (1 millisecond)�[0m�[0m
2023-11-20T06:29:03.9118840Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[32m- removing checkpointer checkpoints one last time (1 millisecond)�[0m�[0m
2
As a next step, please make the PR title and description up-to-date, @junyuc25 . Currently, it's not matched with the AS-IS code.
Thanks for confirming. Here is the ticket: https://issues.apache.org/jira/browse/SPARK-46242 |
Thanks for the suggestions @dongjoon-hyun. Updated the descriptions. |
streaming-kinesis-asl tests in Github Action CI
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
|
Thank you, @junyuc25 and @LuciferYang . |
…ithub Action CI ### What changes were proposed in this pull request? This PR attempts to set up Kinesis tests in one of the existing Github Actions. Note that currently there are totally 57 tests in the Kinesis-asl module, and this PR enabled 35 of them. The remaining tests requires interaction with Amazon Kinesis service which would incur billing costs to users. Hence they are not included in the Github Action. ### Why are the changes needed? Addressing the comments in this PR: apache#42581 (comment) which attempts to upgrade the AWS SDK to v2 for Spark Kinesis connector. Since Kinesis tests are not being run in the Github Actions, there is no automated mechanism to verify the SDK v2 upgrade changes in this module. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? 1. All existing Github Actions passed. 2. All Kinesis tests passed when running locally: `export ENABLE_KINESIS_TESTS=1 && mvn test -Pkinesis-asl -pl connector/kinesis-asl` ``` Tests: succeeded 57, failed 0, canceled 0, ignored 0, pending 0 All tests passed. [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 13:25 min [INFO] Finished at: 2023-11-12T00:15:49+08:00 [INFO] ------------------------------------------------------------------------ ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#43736 from junyuc25/junyuc25/kinesis-test. Authored-by: Junyu Chen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
| "-Pkinesis-asl", | ||
| ], | ||
| environ={"ENABLE_KINESIS_TESTS": "1"}, | ||
| environ={"ENABLE_KINESIS_TESTS": "0"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding tests for the streaming-kinesis-asl module seems to cause daily tests on branch-3.x to fail. It seems that we need to backport this patch to branch-3.x? Do you have any other better suggestions? @HyukjinKwon @dongjoon-hyun
https://github.com/apache/spark/actions/runs/7097271047/job/19317138417#logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am testing a solution that does not require backporting this patch: #44204
…LES_TO_TEST` for branch-3.x daily tests ### What changes were proposed in this pull request? After the merge of #43736, the master branch began testing the `streaming-kinesis-asl` module. At the same time, because the daily test will reuse `build_and_test.yml`, the daily test of branch-3.x also began testing `streaming-kinesis-asl`. However, in branch-3.x, the env `ENABLE_KINESIS_TESTS` is hard-coded as 1 in `dev/sparktestsupport/modules.py`: https://github.com/apache/spark/blob/1321b4e64deaa1e58bf297c25b72319083056568/dev/sparktestsupport/modules.py#L332-L346 which leads to the failure of the daily test of branch-3.x: - branch-3.3: https://github.com/apache/spark/actions/runs/7111246311 - branch-3.4: https://github.com/apache/spark/actions/runs/7098435892 - branch-3.5: https://github.com/apache/spark/actions/runs/7099811235 ``` [info] org.apache.spark.streaming.kinesis.WithoutAggregationKinesisStreamSuite *** ABORTED *** (1 second, 14 milliseconds) [info] java.lang.Exception: Kinesis tests enabled using environment variable ENABLE_KINESIS_TESTS [info] but could not find AWS credentials. Please follow instructions in AWS documentation [info] to set the credentials in your system such that the DefaultAWSCredentialsProviderChain [info] can find the credentials. [info] at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getAWSCredentials(KinesisTestUtils.scala:258) [info] at org.apache.spark.streaming.kinesis.KinesisTestUtils.kinesisClient$lzycompute(KinesisTestUtils.scala:58) [info] at org.apache.spark.streaming.kinesis.KinesisTestUtils.kinesisClient(KinesisTestUtils.scala:57) [info] at org.apache.spark.streaming.kinesis.KinesisTestUtils.describeStream(KinesisTestUtils.scala:168) [info] at org.apache.spark.streaming.kinesis.KinesisTestUtils.findNonExistentStreamName(KinesisTestUtils.scala:181) [info] at org.apache.spark.streaming.kinesis.KinesisTestUtils.createStream(KinesisTestUtils.scala:84) [info] at org.apache.spark.streaming.kinesis.KinesisStreamTests.$anonfun$beforeAll$1(KinesisStreamSuite.scala:61) [info] at org.apache.spark.streaming.kinesis.KinesisFunSuite.runIfTestsEnabled(KinesisFunSuite.scala:41) [info] at org.apache.spark.streaming.kinesis.KinesisFunSuite.runIfTestsEnabled$(KinesisFunSuite.scala:39) [info] at org.apache.spark.streaming.kinesis.KinesisStreamTests.runIfTestsEnabled(KinesisStreamSuite.scala:42) [info] at org.apache.spark.streaming.kinesis.KinesisStreamTests.beforeAll(KinesisStreamSuite.scala:59) [info] at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) [info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) [info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) [info] at org.apache.spark.streaming.kinesis.KinesisStreamTests.org$scalatest$BeforeAndAfter$$super$run(KinesisStreamSuite.scala:42) [info] at org.scalatest.BeforeAndAfter.run(BeforeAndAfter.scala:273) [info] at org.scalatest.BeforeAndAfter.run$(BeforeAndAfter.scala:271) [info] at org.apache.spark.streaming.kinesis.KinesisStreamTests.run(KinesisStreamSuite.scala:42) [info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321) [info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517) [info] at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414) [info] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [info] at java.lang.Thread.run(Thread.java:750) [info] Test run org.apache.spark.streaming.kinesis.JavaKinesisInputDStreamBuilderSuite started [info] Test org.apache.spark.streaming.kinesis.JavaKinesisInputDStreamBuilderSuite.testJavaKinesisDStreamBuilderOldApi started [info] Test org.apache.spark.streaming.kinesis.JavaKinesisInputDStreamBuilderSuite.testJavaKinesisDStreamBuilder started [info] Test run org.apache.spark.streaming.kinesis.JavaKinesisInputDStreamBuilderSuite finished: 0 failed, 0 ignored, 2 total, 0.244s [info] ScalaTest [info] Run completed in 8 seconds, 542 milliseconds. [info] Total number of tests run: 31 [info] Suites: completed 4, aborted 4 [info] Tests: succeeded 31, failed 0, canceled 0, ignored 0, pending 0 [info] *** 4 SUITES ABORTED *** [error] Error: Total 37, Failed 0, Errors 4, Passed 33 [error] Error during tests: [error] org.apache.spark.streaming.kinesis.WithoutAggregationKinesisBackedBlockRDDSuite [error] org.apache.spark.streaming.kinesis.WithAggregationKinesisBackedBlockRDDSuite [error] org.apache.spark.streaming.kinesis.WithAggregationKinesisStreamSuite [error] org.apache.spark.streaming.kinesis.WithoutAggregationKinesisStreamSuite [error] (streaming-kinesis-asl / Test / test) sbt.TestsFailedException: Tests unsuccessful [error] Total time: 13 s, completed Dec 5, 2023 12:03:53 PM ``` This PR adds a conditional judgment for the build task: ```shell if [[ "$MODULES_TO_TEST" == *"streaming-kinesis-asl"* ]] && [[ "${{ inputs.branch }}" =~ ^branch-3 ]]; then MODULES_TO_TEST=${MODULES_TO_TEST//streaming-kinesis-asl, /} fi ``` When `MODULES_TO_TEST` contains `streaming-kinesis-asl` and the test branch is not master, it removes the `streaming-kinesis-asl, ` substring from `MODULES_TO_TEST` and reassigns it to `MODULES_TO_TEST`. This avoids the testing of the `streaming-kinesis-asl` module in branch-3.x. ### Why are the changes needed? Prevent the daily tests of branch-3.x from testing the streaming-kinesis-asl module. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Monitor GA after merged ### Was this patch authored or co-authored using generative AI tooling? No Closes #44204 from LuciferYang/SPARK-46283. Lead-authored-by: yangjie01 <[email protected]> Co-authored-by: YangJie <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…ithub Action CI ### What changes were proposed in this pull request? This PR attempts to set up Kinesis tests in one of the existing Github Actions. Note that currently there are totally 57 tests in the Kinesis-asl module, and this PR enabled 35 of them. The remaining tests requires interaction with Amazon Kinesis service which would incur billing costs to users. Hence they are not included in the Github Action. ### Why are the changes needed? Addressing the comments in this PR: apache#42581 (comment) which attempts to upgrade the AWS SDK to v2 for Spark Kinesis connector. Since Kinesis tests are not being run in the Github Actions, there is no automated mechanism to verify the SDK v2 upgrade changes in this module. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? 1. All existing Github Actions passed. 2. All Kinesis tests passed when running locally: `export ENABLE_KINESIS_TESTS=1 && mvn test -Pkinesis-asl -pl connector/kinesis-asl` ``` Tests: succeeded 57, failed 0, canceled 0, ignored 0, pending 0 All tests passed. [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 13:25 min [INFO] Finished at: 2023-11-12T00:15:49+08:00 [INFO] ------------------------------------------------------------------------ ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#43736 from junyuc25/junyuc25/kinesis-test. Authored-by: Junyu Chen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…LES_TO_TEST` for branch-3.x daily tests ### What changes were proposed in this pull request? After the merge of apache#43736, the master branch began testing the `streaming-kinesis-asl` module. At the same time, because the daily test will reuse `build_and_test.yml`, the daily test of branch-3.x also began testing `streaming-kinesis-asl`. However, in branch-3.x, the env `ENABLE_KINESIS_TESTS` is hard-coded as 1 in `dev/sparktestsupport/modules.py`: https://github.com/apache/spark/blob/1321b4e64deaa1e58bf297c25b72319083056568/dev/sparktestsupport/modules.py#L332-L346 which leads to the failure of the daily test of branch-3.x: - branch-3.3: https://github.com/apache/spark/actions/runs/7111246311 - branch-3.4: https://github.com/apache/spark/actions/runs/7098435892 - branch-3.5: https://github.com/apache/spark/actions/runs/7099811235 ``` [info] org.apache.spark.streaming.kinesis.WithoutAggregationKinesisStreamSuite *** ABORTED *** (1 second, 14 milliseconds) [info] java.lang.Exception: Kinesis tests enabled using environment variable ENABLE_KINESIS_TESTS [info] but could not find AWS credentials. Please follow instructions in AWS documentation [info] to set the credentials in your system such that the DefaultAWSCredentialsProviderChain [info] can find the credentials. [info] at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getAWSCredentials(KinesisTestUtils.scala:258) [info] at org.apache.spark.streaming.kinesis.KinesisTestUtils.kinesisClient$lzycompute(KinesisTestUtils.scala:58) [info] at org.apache.spark.streaming.kinesis.KinesisTestUtils.kinesisClient(KinesisTestUtils.scala:57) [info] at org.apache.spark.streaming.kinesis.KinesisTestUtils.describeStream(KinesisTestUtils.scala:168) [info] at org.apache.spark.streaming.kinesis.KinesisTestUtils.findNonExistentStreamName(KinesisTestUtils.scala:181) [info] at org.apache.spark.streaming.kinesis.KinesisTestUtils.createStream(KinesisTestUtils.scala:84) [info] at org.apache.spark.streaming.kinesis.KinesisStreamTests.$anonfun$beforeAll$1(KinesisStreamSuite.scala:61) [info] at org.apache.spark.streaming.kinesis.KinesisFunSuite.runIfTestsEnabled(KinesisFunSuite.scala:41) [info] at org.apache.spark.streaming.kinesis.KinesisFunSuite.runIfTestsEnabled$(KinesisFunSuite.scala:39) [info] at org.apache.spark.streaming.kinesis.KinesisStreamTests.runIfTestsEnabled(KinesisStreamSuite.scala:42) [info] at org.apache.spark.streaming.kinesis.KinesisStreamTests.beforeAll(KinesisStreamSuite.scala:59) [info] at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) [info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) [info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) [info] at org.apache.spark.streaming.kinesis.KinesisStreamTests.org$scalatest$BeforeAndAfter$$super$run(KinesisStreamSuite.scala:42) [info] at org.scalatest.BeforeAndAfter.run(BeforeAndAfter.scala:273) [info] at org.scalatest.BeforeAndAfter.run$(BeforeAndAfter.scala:271) [info] at org.apache.spark.streaming.kinesis.KinesisStreamTests.run(KinesisStreamSuite.scala:42) [info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321) [info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517) [info] at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414) [info] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [info] at java.lang.Thread.run(Thread.java:750) [info] Test run org.apache.spark.streaming.kinesis.JavaKinesisInputDStreamBuilderSuite started [info] Test org.apache.spark.streaming.kinesis.JavaKinesisInputDStreamBuilderSuite.testJavaKinesisDStreamBuilderOldApi started [info] Test org.apache.spark.streaming.kinesis.JavaKinesisInputDStreamBuilderSuite.testJavaKinesisDStreamBuilder started [info] Test run org.apache.spark.streaming.kinesis.JavaKinesisInputDStreamBuilderSuite finished: 0 failed, 0 ignored, 2 total, 0.244s [info] ScalaTest [info] Run completed in 8 seconds, 542 milliseconds. [info] Total number of tests run: 31 [info] Suites: completed 4, aborted 4 [info] Tests: succeeded 31, failed 0, canceled 0, ignored 0, pending 0 [info] *** 4 SUITES ABORTED *** [error] Error: Total 37, Failed 0, Errors 4, Passed 33 [error] Error during tests: [error] org.apache.spark.streaming.kinesis.WithoutAggregationKinesisBackedBlockRDDSuite [error] org.apache.spark.streaming.kinesis.WithAggregationKinesisBackedBlockRDDSuite [error] org.apache.spark.streaming.kinesis.WithAggregationKinesisStreamSuite [error] org.apache.spark.streaming.kinesis.WithoutAggregationKinesisStreamSuite [error] (streaming-kinesis-asl / Test / test) sbt.TestsFailedException: Tests unsuccessful [error] Total time: 13 s, completed Dec 5, 2023 12:03:53 PM ``` This PR adds a conditional judgment for the build task: ```shell if [[ "$MODULES_TO_TEST" == *"streaming-kinesis-asl"* ]] && [[ "${{ inputs.branch }}" =~ ^branch-3 ]]; then MODULES_TO_TEST=${MODULES_TO_TEST//streaming-kinesis-asl, /} fi ``` When `MODULES_TO_TEST` contains `streaming-kinesis-asl` and the test branch is not master, it removes the `streaming-kinesis-asl, ` substring from `MODULES_TO_TEST` and reassigns it to `MODULES_TO_TEST`. This avoids the testing of the `streaming-kinesis-asl` module in branch-3.x. ### Why are the changes needed? Prevent the daily tests of branch-3.x from testing the streaming-kinesis-asl module. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Monitor GA after merged ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#44204 from LuciferYang/SPARK-46283. Lead-authored-by: yangjie01 <[email protected]> Co-authored-by: YangJie <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>


What changes were proposed in this pull request?
This PR attempts to set up Kinesis tests in one of the existing Github Actions. Note that currently there are totally 57 tests in the Kinesis-asl module, and this PR enabled 35 of them. The remaining tests requires interaction with Amazon Kinesis service which would incur billing costs to users. Hence they are not included in the Github Action.
Why are the changes needed?
Addressing the comments in this PR: #42581 (comment) which attempts to upgrade the AWS SDK to v2 for Spark Kinesis connector. Since Kinesis tests are not being run in the Github Actions, there is no automated mechanism to verify the SDK v2 upgrade changes in this module.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
export ENABLE_KINESIS_TESTS=1 && mvn test -Pkinesis-asl -pl connector/kinesis-aslWas this patch authored or co-authored using generative AI tooling?
No.