[SPARK-30490][SQL] Eliminate compiler warnings in Avro datasource#27174
[SPARK-30490][SQL] Eliminate compiler warnings in Avro datasource#27174MaxGekk wants to merge 3 commits intoapache:masterfrom
Conversation
|
Test build #116528 has finished for PR 27174 at commit
|
| * If the option is not set, the Hadoop's config `avro.mapred.ignore.inputs.without.extension` | ||
| * is taken into account. If the former one is not set too, file extensions are ignored. | ||
| */ | ||
| @deprecated("Use the general data source option pathGlobFilter for filtering file names", "3.0") |
There was a problem hiding this comment.
Why remove this if it's really deprecated? I get that it will remove some compiler warnings, but, that's not super important, or can be worked around as you do elsewhere by deprecating the test methods too?
|
Sean, deprecating of the value doesn’t make any sense because it is not used by users.
|
|
OK, the class appears public though, it's definitely not meant to be
accessed for other reasons?
…On Sat, Jan 11, 2020 at 10:05 AM Maxim Gekk ***@***.***> wrote:
Sean, deprecating of the value doesn’t make any sense because it is not
used by users.
сб, 11 янв. 2020 г. в 18:32, Sean Owen ***@***.***>:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In
> external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala
> <#27174 (comment)>:
>
> > @@ -68,8 +68,10 @@ class AvroOptions(
> * If the option is not set, the Hadoop's config
`avro.mapred.ignore.inputs.without.extension`
> * is taken into account. If the former one is not set too, file
extensions are ignored.
> */
> - @deprecated("Use the general data source option pathGlobFilter for
filtering file names", "3.0")
>
> Why remove this if it's really deprecated? I get that it will remove some
> compiler warnings, but, that's not super important, or can be worked
around
> as you do elsewhere by deprecating the test methods too?
>
> —
> You are receiving this because you authored the thread.
>
>
> Reply to this email directly, view it on GitHub
> <
#27174?email_source=notifications&email_token=AAMB5GPBEQSPURU7DY5UBZDQ5HRBJA5CNFSM4KFRXRNKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCRNUYHY#pullrequestreview-341527583
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AAMB5GNSMNGO5AHKJSOU23TQ5HRBJANCNFSM4KFRXRNA
>
> .
>
--
Yours faithfully,
Maxim Gekk
http://www.linkedin.com/in/maxgekk
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#27174?email_source=notifications&email_token=AAGIZ6XYLIVBDX76OZ3FBDTQ5HU5LA5CNFSM4KFRXRNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIWFBPY#issuecomment-573329599>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGIZ6TVE5OHMOWBX7XPJH3Q5HU5LANCNFSM4KFRXRNA>
.
|
|
AvroOptions (and other options like CSVOptions) shouldn’t be accessible to users. Deprecating any values inside of AvroOptions seems similar to deprecating config entries inside of SQLConf - the values are not visible to users, and they are not aware of compiler warnings.
|
|
@gengliangwang @HyukjinKwon Could you take a look at the PR. |
| */ | ||
| @deprecated("Use the general data source option pathGlobFilter for filtering file names", "3.0") | ||
| val ignoreExtension: Boolean = { | ||
| def warn(s: String): Unit = logWarning( |
There was a problem hiding this comment.
Why do we define a separate method?
There was a problem hiding this comment.
hmm, to reuse the same code in 2 places.
There was a problem hiding this comment.
I don't feel strongly but I think it's fine to don't do it ...
| .getOrElse(!ignoreFilesWithoutExtension) | ||
| .map { ignoreExtensionOption => | ||
| if (ignoreExtensionOption != !ignoreFilesWithoutExtensionByDefault) { | ||
| warn(s"The Avro option '${AvroOptions.ignoreExtensionKey}'") |
There was a problem hiding this comment.
@MaxGekk, from a cursory look, this warning can be shown for every file which I think is noisy:
Do you mind if I ask double check this?
There was a problem hiding this comment.
@HyukjinKwon I will check that but general thoughts are:
- The log warning is printed only if an user sets non-default config values
- I don't think
AvroOptionsshould be created (initialized from scratch) per-each file if it is created in current implementation. I would say it is not necessary to initialize AvroOptions again and again. After all, AvroOptions should be the same for all files/partitions. - And the noise in logs will force people to avoid using of the deprecated options ;-)
There was a problem hiding this comment.
@HyukjinKwon you are right, it prints warnings per each partition. I have confirmed that by the test:
test("count deprecation log events") {
val partitionNum = 3
val logAppender = new AppenderSkeleton {
val loggingEvents = new ArrayBuffer[LoggingEvent]()
override def append(loggingEvent: LoggingEvent): Unit = loggingEvents.append(loggingEvent)
override def close(): Unit = {}
override def requiresLayout(): Boolean = false
}
withTempPath { dir =>
Seq(("a", 1, 2), ("b", 1, 2), ("c", 2, 1), ("d", 2, 1))
.toDF("value", "p1", "p2")
.repartition(partitionNum)
.write
.format("avro")
.option("header", true)
.save(dir.getCanonicalPath)
withLogAppender(logAppender) {
val df = spark
.read
.format("avro")
.schema("value STRING, p1 INTEGER, p2 INTEGER")
.option(AvroOptions.ignoreExtensionKey, false)
.option("header", true)
.load(dir.getCanonicalPath)
df.count()
}
val deprecatedEvents = logAppender.loggingEvents
.map(_.getRenderedMessage)
.filter(_.contains(AvroOptions.ignoreExtensionKey))
assert(deprecatedEvents.size === partitionNum)
}
}There was a problem hiding this comment.
It is interesting that rewriting supportsColumnar as:
override val supportsColumnar: Boolean = {
val factory = readerFactory
require(partitions.forall(factory.supportColumnarReads) ||
!partitions.exists(factory.supportColumnarReads),
"Cannot mix row-based and columnar input partitions.")
partitions.exists(factory.supportColumnarReads)
}does not help too because DataSourceV2ScanExecBase is initialized twice from:
First time:

Second time in TreeNode.makeCopy:

Making supportsColumnar as lazy val doesn't help as well because supportsColumnar is invoked twice for different objects.
There was a problem hiding this comment.
I think it is not nice that we construct some classes twice when it is not necessary. WDYT? /cc @cloud-fan @dongjoon-hyun
There was a problem hiding this comment.
Yea we shouldn't instantiate twice, but not a big problem. I'm more worried about we instantiate it for every partition.
There was a problem hiding this comment.
@MaxGekk, even if we fix this, it will still show the warning twice for schema inference and reading path at the very least. It's okay as long as we show the warning and document. Let's just go simple in this PR. This warning will be removed very soon, too.
| options: Map[String, String], | ||
| files: Seq[FileStatus]): Option[StructType] = { | ||
| val conf = spark.sessionState.newHadoopConf() | ||
| if (options.contains("ignoreExtension")) { |
There was a problem hiding this comment.
@MaxGekk, let's just remove this option after branch-3.0 is cut out.
There was a problem hiding this comment.
Shouldn't it be deprecated explicitly for users before removing? It should be mentioned in docs at least if we don't want to output a warning like in the PR.
There was a problem hiding this comment.
I think it still shows the warning properly although it only shows during schema inference. Yeah, can you simply fix the doc and say it's deprecated at docs/sql-data-sources-avro.md?
There was a problem hiding this comment.
+1 with @HyukjinKwon
Let's remove the option and document it in the future, instead of creating such changes. If we merge this one, then there might be some other options we have to do the same thing.



What changes were proposed in this pull request?
@deprecatedannotation forAvroOptions. ignoreExtensionavro.mapred.ignore.inputs.without.extensionis set to non-default value -trueignoreExtensionis set to non-default value -trueWhy are the changes needed?
AvroOptions.ignoreExtensionis not used by users directly. In this ways, users are not aware of deprecated Hadoop's conf and avro option.Does this PR introduce any user-facing change?
Yes
How was this patch tested?
By
AvroSuite