Skip to content

[SPARK-24772][SQL] Avro: support logical date type#21984

Closed
gengliangwang wants to merge 2 commits intoapache:masterfrom
gengliangwang:avro_date
Closed

[SPARK-24772][SQL] Avro: support logical date type#21984
gengliangwang wants to merge 2 commits intoapache:masterfrom
gengliangwang:avro_date

Conversation

@gengliangwang
Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Support Avro logical date type:
https://avro.apache.org/docs/1.8.2/spec.html#Date

How was this patch tested?

Unit test

@SparkQA
Copy link
Copy Markdown

SparkQA commented Aug 3, 2018

Test build #94132 has finished for PR 21984 at commit 16e0357.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

(getter, ordinal) => ByteBuffer.wrap(getter.getBinary(ordinal))
case DateType =>
(getter, ordinal) => getter.getInt(ordinal) * DateTimeUtils.MILLIS_PER_DAY
(getter, ordinal) => getter.getInt(ordinal)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the write path, let's drop the previous conversion to Long

Copy link
Copy Markdown
Member

@HyukjinKwon HyukjinKwon Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cause a behaviour change comparing to the third party one?

Copy link
Copy Markdown
Member Author

@gengliangwang gengliangwang Aug 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is a real behavior change. The only concern is that the Avro file with date type column is written with this built-in package, and read by third party one with user specify schema. The case should be very trivial and we can ignore that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 kinds of compatibilities:

  1. the file written by old avro data source can be read by the new avro data source
  2. the file written by new avro data source can be read by the old avro data source

I think we should focus on 1) and ignore 2)

s"Cannot convert Avro logical type ${other} to Catalyst Timestamp type.")
}

// Before we upgrade Avro to 1.8 for logical type support, spark-avo converts Long to Date.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: spark-avo -> spark-avro.

@SparkQA
Copy link
Copy Markdown

SparkQA commented Aug 6, 2018

Test build #94260 has finished for PR 21984 at commit c6b997b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Copy Markdown
Contributor

LGTM, merging to master!

@asfgit asfgit closed this in 819c4de Aug 7, 2018
@HyukjinKwon
Copy link
Copy Markdown
Member

LGTM too

otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
In PR apache#21984 and apache#21935 , the related test cases are using binary files created by Python scripts.

Generate the binary files in test suite to make it more transparent.  Also we can

Also move the related test cases to a new file `AvroLogicalTypeSuite.scala`.

Unit test.

Closes apache#22091 from gengliangwang/logicalType_suite.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

RB=2651977
BUG=LIHADOOP-59243
G=spark-reviewers
R=ekrogen
A=ekrogen
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants