Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented Feb 7, 2020

What changes were proposed in this pull request?

This pr intends to add a document for the ANSI mode;

Screen Shot 2020-02-13 at 8 08 52

Screen Shot 2020-02-13 at 8 09 13

Screen Shot 2020-02-13 at 8 09 26

Screen Shot 2020-02-13 at 8 09 38

Why are the changes needed?

For better document coverage and usability.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

N/A

@maropu
Copy link
Member Author

maropu commented Feb 7, 2020

WIP

@SparkQA
Copy link

SparkQA commented Feb 7, 2020

Test build #118026 has finished for PR 27489 at commit ed76c3e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

### Arithmetic Operations

In Spark SQL, arithmetic operations performed on numeric types (with the exception of decimal) are not checked for overflow by default.
This means that in case an operation causes an overflow, the result is the same that the same operation returns in a Java/Scala program (eg. if the sum of 2 integers is higher than the maximum value representable, the result is a negative number).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: eg. -> e.g.

When `spark.sql.ansi.enabled` is set to `true` (`false` by default), Spark SQL follows the standard in basic behaviours (e.g., arithmetic operations, type conversion, and SQL parsing).
Moreover, Spark SQL has an independent option to control implicit casting behaviours when inserting rows in a table.
The casting behaviours are defined as store assignment rules in the standard.
When `spark.sql.storeAssignmentPolicy` is set to `ANSI`, Spark SQL complies with the ANSI store assignment rules and this setting is enabled by default.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to write default value?

@SparkQA
Copy link

SparkQA commented Feb 7, 2020

Test build #118034 has finished for PR 27489 at commit a8681de.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu changed the title [WIP][SPARK-30703][SQL][DOCS] Add a document for the ANSI mode [SPARK-30703][SQL][DOCS] Add a document for the ANSI mode Feb 7, 2020
@dongjoon-hyun
Copy link
Member

cc @gatorsmile

@maropu
Copy link
Member Author

maropu commented Feb 8, 2020

also cc: @gengliangwang @cloud-fan

@maropu
Copy link
Member Author

maropu commented Feb 8, 2020

I wrote a basic part of the ANSI mode. Please tell me if you have more we should explicitly describe here for the mode.

When `spark.sql.ansi.enabled` is set to `true` (`false` by default), Spark SQL follows the standard in basic behaviours (e.g., arithmetic operations, type conversion, and SQL parsing).
Moreover, Spark SQL has an independent option to control implicit casting behaviours when inserting rows in a table.
The casting behaviours are defined as store assignment rules in the standard.
When `spark.sql.storeAssignmentPolicy` is set to `ANSI`, Spark SQL complies with the ANSI store assignment rules and this setting is a default value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we summarize as a table like https://spark.apache.org/docs/3.0.0-preview2/configuration.html#application-properties ? That will be more easier to explain and to understand and to maintain(add new conf later or change the default value).

For example, the default value of spark.sql.storeAssignmentPolicy is not mentioned in this paragraph.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. But, when I was writing this doc, I assumed the basic info about spark.sql.storeAssignmentPolicy would appear in the SQL Conf doc: #27459
Better to describe the two options in a tabular format here, too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am +1 with showing a table here, which can be more clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

In Spark SQL, arithmetic operations performed on numeric types (with the exception of decimal) are not checked for overflow by default.
This means that in case an operation causes an overflow, the result is the same that the same operation returns in a Java/Scala program (e.g., if the sum of 2 integers is higher than the maximum value representable, the result is a negative number).
On the other hand, Spark SQL returns null for decimal overflow.
When `spark.sql.ansi.enabled` is set to `true` and overflow occurs in numeric and interval arithmetic operations, it throws an arithmetic exception at runtime.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overflow -> an overflow?

### Type Conversion

Spark SQL has three kinds of type conversions: explicit casting, type coercion, and store assignment casting.
When `spark.sql.ansi.enabled` is set to `true`, explicit castings by `CAST` syntax throws a number-format exception at runtime for illegal cast patterns defined in the standard, e.g. casts from a string to an integer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throws -> throw?


Spark SQL has three kinds of type conversions: explicit casting, type coercion, and store assignment casting.
When `spark.sql.ansi.enabled` is set to `true`, explicit castings by `CAST` syntax throws a number-format exception at runtime for illegal cast patterns defined in the standard, e.g. casts from a string to an integer.
On the other hand, `INSERT INTO` syntax throws an analysis exception when the ANSI mode enabled via `spark.sql.storeAssignmentPolicy=ANSI`.
Copy link
Member

@dongjoon-hyun dongjoon-hyun Feb 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question, shall we mention CTAS(CREATE TABLE AS SELECT) together?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good suggestion! I totally missed that syntax. I'll check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked CTAS (CTAS cannot exist with schema definition);

scala> sql("create table t1 (v string)")
scala> sql("create table t2 (v int) as select * from t1")
org.apache.spark.sql.catalyst.parser.ParseException:
Operation not allowed: Schema may not be specified in a Create Table As Select (CTAS) statement(line 1, pos 0)

Any other concern?


java.lang.NumberFormatException: invalid input syntax for type numeric: a

-- `spark.sql.ansi.enabled=false` (This is a legacy behaviour until Spark 2.x)
Copy link
Member

@dongjoon-hyun dongjoon-hyun Feb 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

legacy can mislead the users. We may need to point out again this is the default behavior in 3.0.0, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, I see. I'll update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to This is a default behaviour. I dropped the in 3.0.0 phrase in the statement because to I thought update it in future releases (4.0, 5.0, ...) looks a bit annoying.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @maropu . Only minor comments from me.

@SparkQA
Copy link

SparkQA commented Feb 9, 2020

Test build #118083 has finished for PR 27489 at commit 6b0d5d6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

### Type Conversion

Spark SQL has three kinds of type conversions: explicit casting, type coercion, and store assignment casting.
When `spark.sql.ansi.enabled` is set to `true`, explicit casting by `CAST` syntax throws a number-format exception at runtime for illegal cast patterns defined in the standard, e.g. casts from a string to an integer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

number-format exception is not the only runtime exception for ANSI mode

spark.sql("select cast(2147483648L as int)").show()
java.lang.ArithmeticException: Casting 2147483648 to int causes overflow

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. I'll update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@gengliangwang
Copy link
Member

@maropu Thanks for the work!

@SparkQA
Copy link

SparkQA commented Feb 12, 2020

Test build #118327 has finished for PR 27489 at commit 64b42b4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@maropu
Copy link
Member Author

maropu commented Feb 13, 2020

Thanks for the check, @gengliangwang !

gengliangwang pushed a commit that referenced this pull request Feb 13, 2020
### What changes were proposed in this pull request?

This pr intends to add a document for the ANSI mode;

<img width="600" alt="Screen Shot 2020-02-13 at 8 08 52" src="https://user-images.githubusercontent.com/692303/74386041-5934f780-4e38-11ea-8162-26e524e11c65.png">
<img width="600" alt="Screen Shot 2020-02-13 at 8 09 13" src="https://user-images.githubusercontent.com/692303/74386040-589c6100-4e38-11ea-8a64-899788eaf55f.png">
<img width="600" alt="Screen Shot 2020-02-13 at 8 09 26" src="https://user-images.githubusercontent.com/692303/74386039-5803ca80-4e38-11ea-949f-049208d2203d.png">
<img width="600" alt="Screen Shot 2020-02-13 at 8 09 38" src="https://user-images.githubusercontent.com/692303/74386036-563a0700-4e38-11ea-9ec3-87a8f6771cf0.png">

### Why are the changes needed?

For better document coverage and usability.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

N/A

Closes #27489 from maropu/SPARK-30703.

Authored-by: Takeshi Yamamuro <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
(cherry picked from commit 3c4044e)
Signed-off-by: Gengliang Wang <[email protected]>
@gengliangwang
Copy link
Member

Merged to master/branch-3.0

@dongjoon-hyun
Copy link
Member

Thank you, @maropu and @gengliangwang !

@maropu
Copy link
Member Author

maropu commented Feb 13, 2020

Thanks, again, @gengliangwang and @dongjoon-hyun ~

HyukjinKwon pushed a commit that referenced this pull request Feb 17, 2020
…tions as experimental

### What changes were proposed in this pull request?

This is a follow-up of #27489.
It declares the ANSI SQL compliance options as experimental in the documentation.

### Why are the changes needed?

The options are experimental. There can be new features/behaviors in future releases.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Generating doc

Closes #27590 from gengliangwang/ExperimentalAnsi.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit da2ca85)
Signed-off-by: HyukjinKwon <[email protected]>
HyukjinKwon pushed a commit that referenced this pull request Feb 17, 2020
…tions as experimental

### What changes were proposed in this pull request?

This is a follow-up of #27489.
It declares the ANSI SQL compliance options as experimental in the documentation.

### Why are the changes needed?

The options are experimental. There can be new features/behaviors in future releases.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Generating doc

Closes #27590 from gengliangwang/ExperimentalAnsi.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

late LGTM!

### Arithmetic Operations

In Spark SQL, arithmetic operations performed on numeric types (with the exception of decimal) are not checked for overflows by default.
This means that in case an operation causes overflows, the result is the same that the same operation returns in a Java/Scala program (e.g., if the sum of 2 integers is higher than the maximum value representable, the result is a negative number).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the result is the same that -> the result is the same with

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. I'll fix later. (If other developers plan to open a PR for typo fixes, it would be helpful to include this fix as well.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With legacy policy, Spark allows the type coercion as long as it is a valid Cast, which is very loose.
e.g. converting string to int or double to boolean is allowed.
It is also the only behavior in Spark 2.x and it is compatible with Hive.
With strict policy, Spark doesn't allow any possible precision loss or data truncation in type coercion,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should remove the STRICT mode. It's not ANSI compliant and no other SQL system has this behavior.

cc @rdblue @brkyvz @rxin

@HyukjinKwon
Copy link
Member

Looks good. Thanks, @maropu.

sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?

This pr intends to add a document for the ANSI mode;

<img width="600" alt="Screen Shot 2020-02-13 at 8 08 52" src="https://user-images.githubusercontent.com/692303/74386041-5934f780-4e38-11ea-8162-26e524e11c65.png">
<img width="600" alt="Screen Shot 2020-02-13 at 8 09 13" src="https://user-images.githubusercontent.com/692303/74386040-589c6100-4e38-11ea-8a64-899788eaf55f.png">
<img width="600" alt="Screen Shot 2020-02-13 at 8 09 26" src="https://user-images.githubusercontent.com/692303/74386039-5803ca80-4e38-11ea-949f-049208d2203d.png">
<img width="600" alt="Screen Shot 2020-02-13 at 8 09 38" src="https://user-images.githubusercontent.com/692303/74386036-563a0700-4e38-11ea-9ec3-87a8f6771cf0.png">

### Why are the changes needed?

For better document coverage and usability.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

N/A

Closes apache#27489 from maropu/SPARK-30703.

Authored-by: Takeshi Yamamuro <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…tions as experimental

### What changes were proposed in this pull request?

This is a follow-up of apache#27489.
It declares the ANSI SQL compliance options as experimental in the documentation.

### Why are the changes needed?

The options are experimental. There can be new features/behaviors in future releases.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Generating doc

Closes apache#27590 from gengliangwang/ExperimentalAnsi.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants