Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -306,13 +306,14 @@ object JdbcUtils extends Logging {
}
val metadata = new MetadataBuilder()
// SPARK-33888
// - include scale in metadata for only DECIMAL & NUMERIC
// - include scale in metadata for only DECIMAL & NUMERIC as well as ARRAY (for Postgres)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we always include the scale metadata if it's present? cc @saikocat

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do that for simplification. But we need to fix the tests for the rest of the dialects. Cos it adds {"scale": 0} to all metadata, then existing tests failed (previously metadata didn't get build in getSchema(), so the JSON meta didn't generated)

That's why I choose to set it to only decimal and numeric. The problem also eluded me cos the test for array in Postgresql dialects call the toCalaystType directly instead of going through the code path of using metadata. Sorry on phone so it's hard for me to link the line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, let me elaborate more so you two (cc: @skestle) can decide on which approach to go for. Though I'm kind of favor the current approach of adding data type matching for adding scale metadata cos fixing the failing tests will be more difficult and it makes the JDBCSuite test "jdbc data source shouldn't have unnecessary metadata in its schema" test slightly lose its meaning.

So in order to push "logical_time_type" to metadata, I have to force metadata to be built in the field type as of here:
skestle@0b647fe#diff-c3859e97335ead4b131263565c987d877bea0af3adbd6c5bf2d3716768d2e083R323 whereas previously, metadata can be built by the dialect or completely ignored (as default)

This will cause 3 tests in JDBCSuite to fail cause of schema mismatch (extra {"scale": 0} in the metadata always present) [1. "jdbc API support custom schema", 2. "jdbc API custom schema DDL-like strings.", 3. "jdbc data source shouldn't have unnecessary metadata in its schema"].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whereas previously, metadata can be built by the dialect or completely ignored (as default)

If it was dialect's responsibility to put things into metadata before, I think this PR should put the fix in PostgresDialect.

Copy link
Contributor

@saikocat saikocat Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the only way fieldScale can make it into the dialect is by the field metadata. So it is a very chicken and egg problem.

EDIT: Postgresql utilize the metadatabuilder to get the scale for array[][] of type numeric for example - cos dataType is ARRAY but the typeName is _numeric - note the underscore specific for Postgresql. Whereas MySQL dialect is putting more info into the metadata (like put("binarylong")). The use cases differ.

Might have to change the interface somehow to let the ResultSetMetadata to be passed or init-ed to the dialect.

// - include TIME type metadata
// - always build the metadata
dataType match {
// scalastyle:off
case java.sql.Types.NUMERIC => metadata.putLong("scale", fieldScale)
case java.sql.Types.DECIMAL => metadata.putLong("scale", fieldScale)
case java.sql.Types.ARRAY => metadata.putLong("scale", fieldScale) // PostgresDialect.scala wants this information
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this issue is only for postgresql, could you add this fix in PostgresDialect?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for @maropu 's advice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way fieldScale can make it into the dialect is by the field metadata.
It was always added prior to the previous commit (which I agree with on a fundamental level)
skestle@0b647fe#diff-c3859e97335ead4b131263565c987d877bea0af3adbd6c5bf2d3716768d2e083

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've explained a few more rationale regarding this change proposed by @skestle in the comment below #31252 (comment)

case java.sql.Types.TIME => metadata.putBoolean("logical_time_type", true)
case _ =>
// scalastyle:on
Expand Down