[SPARK-53183][SQL] Use Java `Files.readString` instead of `o.a.s.sql.catalyst.util.fileToString` by dongjoon-hyun · Pull Request #51911 · apache/spark

dongjoon-hyun · 2025-08-07T21:33:04Z

What changes were proposed in this pull request?

This PR aims to use Java 11+ java.nio.file.Files.readString instead of o.a.s.sql.catalyst.util.fileToString. In other words, this PR removes Spark's fileToString method from Spark code base.

Why are the changes needed?

Since Java 11, Files.readString exists. So, we don't need to maintain fileToString method. Note that Apache Spark always uses the default value of encoding, UTF-8.

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala

Lines 51 to 58 in c77f316

    
           def fileToString(file: File, encoding: Charset = UTF_8): String = { 
        
             val inStream = new FileInputStream(file) 
        
             try { 
        
               new String(ByteStreams.toByteArray(inStream), encoding) 
        
             } finally { 
        
               inStream.close() 
        
             } 
        
           }

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

BEFORE

$ git grep fileToString | wc -l
      22

AFTER

$ git grep fileToString | wc -l
       0

Was this patch authored or co-authored using generative AI tooling?

No.

…catalyst.util.fileToString`

dongjoon-hyun · 2025-08-08T00:51:37Z

Hi, @zhengruifeng . Could you review this method-replacement PR if you have some time, please?

dongjoon-hyun · 2025-08-08T02:37:45Z

Could you review this PR when you have some time, @yaooqinn ?

yaooqinn

LGTM

dongjoon-hyun · 2025-08-08T03:00:01Z

Thank you so much. Roughly, at the first run, Java one is faster.

$ bin/spark-shell --driver-memory 12G
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.1.0-preview1
      /_/

Using Scala version 2.13.16 (OpenJDK 64-Bit Server VM, Java 21.0.8)
Type in expressions to have them evaluated.
Type :help for more information.
25/08/07 19:59:09 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1754621949360).
Spark session available as 'spark'.

scala> spark.time(org.apache.spark.sql.catalyst.util.fileToString(new java.io.File("/tmp/1G.bin")).length)
Time taken: 523 ms
val res0: Int = 1073741824

$ bin/spark-shell --driver-memory 12G
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.1.0-preview1
      /_/

Using Scala version 2.13.16 (OpenJDK 64-Bit Server VM, Java 21.0.8)
Type in expressions to have them evaluated.
Type :help for more information.
25/08/07 19:59:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1754621942077).
Spark session available as 'spark'.

scala> spark.time(java.nio.file.Files.readString(java.nio.file.Path.of("/tmp/1G.bin")).length)
Time taken: 339 ms
val res0: Int = 1073741824

dongjoon-hyun · 2025-08-08T03:00:16Z

The test failure is a known flaky one. Merged to master.

zhengruifeng · 2025-08-08T05:38:05Z

Late LGTM

dongjoon-hyun · 2025-08-08T22:52:37Z

Thank you, @zhengruifeng .

…reading in GlutenSQLQueryTestSuite see apache/spark#51911

…reading in GlutenSQLQueryTestSuite see apache/spark#51911 which removes Spark's fileToString method from Spark code base.

* [Scala 2.13][IntelliJ] Remove suppression for lint-multiarg-infix warnings in pom.xml see apache/spark#43332 * [Scala 2.13][IntelliJ] Suppress warning for `ContentFile::path` * [Scala 2.13][IntelliJ] Suppress warning for ContextAwareIterator initialization * [Scala 2.13][IntelliJ] Refactor to use Symbol for column references to fix compilation error in Scala 2.13 with IntelliJ compiler: symbol literal is deprecated; use Symbol("i") * [Fix] Replace deprecated fileToString with Files.readString for file reading in GlutenSQLQueryTestSuite see apache/spark#51911 which removes Spark's fileToString method from Spark code base. * [Scala 2.13][IntelliJ] Update the Java compiler release version from 8 to `${java.version}` in the Scala 2.13 profiler to align it with `maven.compiler.target` * [Refactor] Replace usage of `Symbol` with `col` for column references to align with Spark API best practices --------- Co-authored-by: Chang chen <chenchang@apache.com>

github-actions bot added the SQL label Aug 7, 2025

dongjoon-hyun force-pushed the SPARK-53183 branch 2 times, most recently from 0a1c44a to e13c675 Compare August 7, 2025 22:02

[SPARK-53183][SQL] Use Java Files.readString instead of `o.a.s.sql.…

fd1ad5b

…catalyst.util.fileToString`

dongjoon-hyun force-pushed the SPARK-53183 branch from e13c675 to fd1ad5b Compare August 7, 2025 22:29

yaooqinn approved these changes Aug 8, 2025

View reviewed changes

dongjoon-hyun closed this in 69b45c6 Aug 8, 2025

dongjoon-hyun deleted the SPARK-53183 branch August 8, 2025 03:01

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 22, 2025

[Fix] Replace deprecated fileToString with Files.readString for file …

f60a601

…reading in GlutenSQLQueryTestSuite see apache/spark#51911

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 23, 2025

[Fix] Replace deprecated fileToString with Files.readString for file …

7c3b579

…reading in GlutenSQLQueryTestSuite see apache/spark#51911

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 23, 2025

[Fix] Replace deprecated fileToString with Files.readString for file …

b68a54b

…reading in GlutenSQLQueryTestSuite see apache/spark#51911

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 24, 2025

[Fix] Replace deprecated fileToString with Files.readString for file …

0c94605

…reading in GlutenSQLQueryTestSuite see apache/spark#51911

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 25, 2025

[Fix] Replace deprecated fileToString with Files.readString for file …

1c81d12

…reading in GlutenSQLQueryTestSuite see apache/spark#51911

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 25, 2025

[Fix] Replace deprecated fileToString with Files.readString for file …

aa22c49

…reading in GlutenSQLQueryTestSuite see apache/spark#51911

baibaichen mentioned this pull request Dec 29, 2025

[GLUTEN-11341][CORE] Support Scala 2.13 with IntelliJ IDE apache/incubator-gluten#11342

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-53183][SQL] Use Java `Files.readString` instead of `o.a.s.sql.catalyst.util.fileToString`#51911

[SPARK-53183][SQL] Use Java `Files.readString` instead of `o.a.s.sql.catalyst.util.fileToString`#51911
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-53183

dongjoon-hyun commented Aug 7, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun commented Aug 8, 2025

Uh oh!

dongjoon-hyun commented Aug 8, 2025

Uh oh!

yaooqinn left a comment

Uh oh!

dongjoon-hyun commented Aug 8, 2025

Uh oh!

dongjoon-hyun commented Aug 8, 2025

Uh oh!

zhengruifeng commented Aug 8, 2025

Uh oh!

dongjoon-hyun commented Aug 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

	def fileToString(file: File, encoding: Charset = UTF_8): String = {
	val inStream = new FileInputStream(file)
	try {
	new String(ByteStreams.toByteArray(inStream), encoding)
	} finally {
	inStream.close()
	}
	}

Conversation

dongjoon-hyun commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun commented Aug 8, 2025

Uh oh!

dongjoon-hyun commented Aug 8, 2025

Uh oh!

yaooqinn left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Aug 8, 2025

Uh oh!

dongjoon-hyun commented Aug 8, 2025

Uh oh!

zhengruifeng commented Aug 8, 2025

Uh oh!

dongjoon-hyun commented Aug 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

dongjoon-hyun commented Aug 7, 2025 •

edited

Loading