Skip to content

Commit bbc887b

Browse files
yuningzh-dbHyukjinKwon
authored andcommitted
[SPARK-33089][SQL] make avro format propagate Hadoop config from DS options to underlying HDFS file system
### What changes were proposed in this pull request? In `AvroUtils`'s `inferSchema()`, propagate Hadoop config from DS options to underlying HDFS file system. ### Why are the changes needed? There is a bug that when running: ```scala spark.read.format("avro").options(conf).load(path) ``` The underlying file system will not receive the `conf` options. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? unit test added Closes #29971 from yuningzh-db/avro_options. Authored-by: Yuning Zhang <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
1 parent 39510b0 commit bbc887b

File tree

2 files changed

+11
-1
lines changed

2 files changed

+11
-1
lines changed

external/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ private[sql] object AvroUtils extends Logging {
4343
spark: SparkSession,
4444
options: Map[String, String],
4545
files: Seq[FileStatus]): Option[StructType] = {
46-
val conf = spark.sessionState.newHadoopConf()
46+
val conf = spark.sessionState.newHadoopConfWithOptions(options)
4747
val parsedOptions = new AvroOptions(options, conf)
4848

4949
if (parsedOptions.parameters.contains(ignoreExtensionKey)) {

external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1802,6 +1802,16 @@ abstract class AvroSuite extends QueryTest with SharedSparkSession with NestedDa
18021802
assert(version === SPARK_VERSION_SHORT)
18031803
}
18041804
}
1805+
1806+
test("SPARK-33089: should propagate Hadoop config from DS options to underlying file system") {
1807+
withSQLConf(
1808+
"fs.file.impl" -> classOf[FakeFileSystemRequiringDSOption].getName,
1809+
"fs.file.impl.disable.cache" -> "true") {
1810+
val conf = Map("ds_option" -> "value")
1811+
val path = "file:" + testAvro.stripPrefix("file:")
1812+
spark.read.format("avro").options(conf).load(path)
1813+
}
1814+
}
18051815
}
18061816

18071817
class AvroV1Suite extends AvroSuite {

0 commit comments

Comments
 (0)