Skip to content

Conversation

@Ngone51
Copy link
Member

@Ngone51 Ngone51 commented May 28, 2024

This PR backports #46706 to branch 3.5.

What changes were proposed in this pull request?

This PR cleans up mapIdToMapIndex when the corresponding mapstatus is unregistered in three places:

  • removeMapOutput
  • removeOutputsByFilter
  • addMapOutput (old mapstatus overwritten)

Why are the changes needed?

There is only one valid mapstatus for the same mapIndex at the same time in Spark. mapIdToMapIndex should also follows the same rule to avoid chaos.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@LuciferYang
Copy link
Contributor

Traceback (most recent call last):
  File "/home/runner/work/spark/spark/./dev/run-tests.py", line 674, in <module>
    main()
  File "/home/runner/work/spark/spark/./dev/run-tests.py", line 547, in main
    changed_files = identify_changed_files_from_git_commits(
  File "/home/runner/work/spark/spark/dev/sparktestsupport/utils.py", line 86, in identify_changed_files_from_git_commits
    raw_output = subprocess.check_output(
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 5[26](https://github.com/apache/spark/actions/runs/9264159299/job/25483780970?pr=46768#step:9:27), in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['git', 'diff', '--name-only', '07db4e5871cc083cd0178f5772b6884fe1b0dc04', 'c9d94ef8e7c7d35e3f2995ffb63596a993a766c8']' returned non-zero exit status 1[28](https://github.com/apache/spark/actions/runs/9264159299/job/25483780970?pr=46768#step:9:29).

It seems git diff failed to execute..

@Ngone51 Ngone51 force-pushed the SPARK-48394-3.5 branch from 07db4e5 to 7d57953 Compare May 30, 2024 08:27
@Ngone51
Copy link
Member Author

Ngone51 commented May 31, 2024

OracleIntegrationSuite seems to be broken in branch-3.5. @yaooqinn Do you have insight for this?

[info] OracleIntegrationSuite:
[info] org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite *** ABORTED *** (49 seconds, 935 milliseconds)
[info]   java.sql.SQLSyntaxErrorException: ORA-00933: SQL command not properly ended
[info]   at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:702)
[info]   at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:608)
[info]   at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1248)
[info]   at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:1041)
[info]   at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:443)
[info]   at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:518)
[info]   at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:251)
[info]   at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:1181)
[info]   at oracle.jdbc.driver.OracleStatement.executeSQLStatement(OracleStatement.java:1571)
[info]   at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1345)
[info]   at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3728)
[info]   at oracle.jdbc.driver.OraclePreparedStatement.executeLargeUpdate(OraclePreparedStatement.java:3905)
[info]   at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePreparedStatement.java:3880)
[info]   at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeUpdate(OraclePreparedStatementWrapper.java:993)
[info]   at org.apache.spark.sql.jdbc.v2.DockerJDBCIntegrationV2Suite.dataPreparation(DockerJDBCIntegrationV2Suite.scala:43)
[info]   at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.$anonfun$beforeAll$1(DockerJDBCIntegrationSuite.scala:171)
[info]   at org.apache.spark.sql.jdbc.DockerIntegrationFunSuite.runIfTestsEnabled(DockerIntegrationFunSuite.scala:49)
[info]   at org.apache.spark.sql.jdbc.DockerIntegrationFunSuite.runIfTestsEnabled$(DockerIntegrationFunSuite.scala:47)
[info]   at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.runIfTestsEnabled(DockerJDBCIntegrationSuite.scala:95)
[info]   at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.beforeAll(DockerJDBCIntegrationSuite.scala:118)
[info]   at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
[info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
[info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:69)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
[info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:750)
[info]   Cause: oracle.jdbc.OracleDatabaseException: ORA-00933: SQL command not properly ended
[info]   at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:710)
[info]   at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:608)
[info]   at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1248)
[info]   at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:1041)
[info]   at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:443)
[info]   at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:518)
[info]   at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:251)
[info]   at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:1181)
[info]   at oracle.jdbc.driver.OracleStatement.executeSQLStatement(OracleStatement.java:1571)
[info]   at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1345)
[info]   at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3728)
[info]   at oracle.jdbc.driver.OraclePreparedStatement.executeLargeUpdate(OraclePreparedStatement.java:3905)
[info]   at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePreparedStatement.java:3880)
[info]   at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeUpdate(OraclePreparedStatementWrapper.java:993)
[info]   at org.apache.spark.sql.jdbc.v2.DockerJDBCIntegrationV2Suite.dataPreparation(DockerJDBCIntegrationV2Suite.scala:43)
[info]   at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.$anonfun$beforeAll$1(DockerJDBCIntegrationSuite.scala:171)
[info]   at org.apache.spark.sql.jdbc.DockerIntegrationFunSuite.runIfTestsEnabled(DockerIntegrationFunSuite.scala:49)
[info]   at org.apache.spark.sql.jdbc.DockerIntegrationFunSuite.runIfTestsEnabled$(DockerIntegrationFunSuite.scala:47)
[info]   at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.runIfTestsEnabled(DockerJDBCIntegrationSuite.scala:95)
[info]   at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.beforeAll(DockerJDBCIntegrationSuite.scala:118)
[info]   at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
[info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
[info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:69)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
[info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:750)

@yaooqinn
Copy link
Member

Hi @Ngone51 #46807 is about to recover this.

This PR cleans up `mapIdToMapIndex` when the corresponding mapstatus is unregistered in three places:
* `removeMapOutput`
* `removeOutputsByFilter`
* `addMapOutput` (old mapstatus overwritten)

There is only one valid mapstatus for the same `mapIndex` at the same time in Spark. `mapIdToMapIndex` should also follows the same rule to avoid chaos.

No.

Unit tests.

No.

Closes apache#46706 from Ngone51/SPARK-43043-followup.

Lead-authored-by: Yi Wu <[email protected]>
Co-authored-by: wuyi <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@Ngone51 Ngone51 force-pushed the SPARK-48394-3.5 branch from 7d57953 to 15ed5a0 Compare May 31, 2024 13:34
Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yaooqinn
Copy link
Member

yaooqinn commented Jun 3, 2024

Merged to 3.5, thank you all

yaooqinn pushed a commit that referenced this pull request Jun 3, 2024
This PR backports #46706 to branch 3.5.

### What changes were proposed in this pull request?

This PR cleans up `mapIdToMapIndex` when the corresponding mapstatus is unregistered in three places:
* `removeMapOutput`
* `removeOutputsByFilter`
* `addMapOutput` (old mapstatus overwritten)

### Why are the changes needed?

There is only one valid mapstatus for the same `mapIndex` at the same time in Spark. `mapIdToMapIndex` should also follows the same rule to avoid chaos.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46768 from Ngone51/SPARK-48394-3.5.

Authored-by: Yi Wu <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
@yaooqinn yaooqinn closed this Jun 3, 2024
@Ngone51
Copy link
Member Author

Ngone51 commented Jun 3, 2024

Thanks all!

turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
This PR backports apache#46706 to branch 3.5.

### What changes were proposed in this pull request?

This PR cleans up `mapIdToMapIndex` when the corresponding mapstatus is unregistered in three places:
* `removeMapOutput`
* `removeOutputsByFilter`
* `addMapOutput` (old mapstatus overwritten)

### Why are the changes needed?

There is only one valid mapstatus for the same `mapIndex` at the same time in Spark. `mapIdToMapIndex` should also follows the same rule to avoid chaos.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46768 from Ngone51/SPARK-48394-3.5.

Authored-by: Yi Wu <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants