Commit e4f4886
[SPARK-2096][SQL] Correctly parse dot notations
First let me write down the current `projections` grammar of spark sql:
expression : orExpression
orExpression : andExpression {"or" andExpression}
andExpression : comparisonExpression {"and" comparisonExpression}
comparisonExpression : termExpression | termExpression "=" termExpression | termExpression ">" termExpression | ...
termExpression : productExpression {"+"|"-" productExpression}
productExpression : baseExpression {"*"|"/"|"%" baseExpression}
baseExpression : expression "[" expression "]" | ... | ident | ...
ident : identChar {identChar | digit} | delimiters | ...
identChar : letter | "_" | "."
delimiters : "," | ";" | "(" | ")" | "[" | "]" | ...
projection : expression [["AS"] ident]
projections : projection { "," projection}
For something like `a.b.c[1]`, it will be parsed as:
<img src="http://img51.imgspice.com/i/03008/4iltjsnqgmtt_t.jpg" border=0>
But for something like `a[1].b`, the current grammar can't parse it correctly.
A simple solution is written in `ParquetQuerySuite#NestedSqlParser`, changed grammars are:
delimiters : "." | "," | ";" | "(" | ")" | "[" | "]" | ...
identChar : letter | "_"
baseExpression : expression "[" expression "]" | expression "." ident | ... | ident | ...
This works well, but can't cover some corner case like `select t.a.b from table as t`:
<img src="http://img51.imgspice.com/i/03008/v2iau3hoxoxg_t.jpg" border=0>
`t.a.b` parsed as `GetField(GetField(UnResolved("t"), "a"), "b")` instead of `GetField(UnResolved("t.a"), "b")` using this new grammar.
However, we can't resolve `t` as it's not a filed, but the whole table.(if we could do this, then `select t from table as t` is legal, which is unexpected)
My solution is:
dotExpressionHeader : ident "." ident
baseExpression : expression "[" expression "]" | expression "." ident | ... | dotExpressionHeader | ident | ...
I passed all test cases under sql locally and add a more complex case.
"arrayOfStruct.field1 to access all values of field1" is not supported yet. Since this PR has changed a lot of code, I will open another PR for it.
I'm not familiar with the latter optimize phase, please correct me if I missed something.
Author: Wenchen Fan <[email protected]>
Author: Michael Armbrust <[email protected]>
Closes apache#2230 from cloud-fan/dot and squashes the following commits:
e1a8898 [Wenchen Fan] remove support for arbitrary nested arrays
ee8a724 [Wenchen Fan] rollback LogicalPlan, support dot operation on nested array type
a58df40 [Michael Armbrust] add regression test for doubly nested data
16bc4c6 [Wenchen Fan] some enhance
95d733f [Wenchen Fan] split long line
dc31698 [Wenchen Fan] SPARK-2096 Correctly parse dot notations1 parent 1f4a648 commit e4f4886
File tree
6 files changed
+88
-90
lines changed- sql
- catalyst/src/main/scala/org/apache/spark/sql/catalyst
- plans/logical
- core/src/test/scala/org/apache/spark/sql
- json
- parquet
- hive/src/test/scala/org/apache/spark/sql/hive/execution
6 files changed
+88
-90
lines changedLines changed: 11 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
357 | 357 | | |
358 | 358 | | |
359 | 359 | | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
360 | 363 | | |
361 | 364 | | |
362 | 365 | | |
363 | 366 | | |
364 | 367 | | |
365 | 368 | | |
| 369 | + | |
366 | 370 | | |
367 | 371 | | |
368 | 372 | | |
369 | 373 | | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
370 | 379 | | |
371 | 380 | | |
372 | 381 | | |
| |||
380 | 389 | | |
381 | 390 | | |
382 | 391 | | |
383 | | - | |
| 392 | + | |
384 | 393 | | |
385 | 394 | | |
386 | 395 | | |
| |||
401 | 410 | | |
402 | 411 | | |
403 | 412 | | |
404 | | - | |
| 413 | + | |
405 | 414 | | |
406 | 415 | | |
407 | 416 | | |
| |||
Lines changed: 1 addition & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
104 | 104 | | |
105 | 105 | | |
106 | 106 | | |
107 | | - | |
108 | | - | |
109 | | - | |
110 | | - | |
111 | | - | |
| 107 | + | |
112 | 108 | | |
113 | 109 | | |
114 | 110 | | |
| |||
Lines changed: 14 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
581 | 581 | | |
582 | 582 | | |
583 | 583 | | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
584 | 598 | | |
Lines changed: 26 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
82 | 82 | | |
83 | 83 | | |
84 | 84 | | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
85 | 111 | | |
Lines changed: 24 additions & 78 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
| 21 | + | |
20 | 22 | | |
21 | | - | |
22 | 23 | | |
23 | 24 | | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | 25 | | |
29 | | - | |
30 | | - | |
31 | 26 | | |
32 | | - | |
| 27 | + | |
33 | 28 | | |
34 | 29 | | |
35 | 30 | | |
| |||
87 | 82 | | |
88 | 83 | | |
89 | 84 | | |
90 | | - | |
91 | | - | |
92 | | - | |
93 | 85 | | |
94 | | - | |
95 | 86 | | |
96 | 87 | | |
97 | 88 | | |
| |||
718 | 709 | | |
719 | 710 | | |
720 | 711 | | |
721 | | - | |
722 | | - | |
723 | | - | |
| 712 | + | |
724 | 713 | | |
725 | | - | |
| 714 | + | |
726 | 715 | | |
727 | 716 | | |
728 | 717 | | |
| |||
733 | 722 | | |
734 | 723 | | |
735 | 724 | | |
736 | | - | |
737 | | - | |
738 | | - | |
| 725 | + | |
739 | 726 | | |
740 | | - | |
| 727 | + | |
741 | 728 | | |
742 | 729 | | |
743 | 730 | | |
744 | | - | |
| 731 | + | |
745 | 732 | | |
746 | 733 | | |
747 | 734 | | |
748 | 735 | | |
749 | 736 | | |
750 | | - | |
| 737 | + | |
751 | 738 | | |
752 | 739 | | |
753 | 740 | | |
| |||
760 | 747 | | |
761 | 748 | | |
762 | 749 | | |
763 | | - | |
764 | | - | |
| 750 | + | |
765 | 751 | | |
766 | 752 | | |
767 | | - | |
| 753 | + | |
768 | 754 | | |
769 | 755 | | |
770 | 756 | | |
771 | | - | |
| 757 | + | |
772 | 758 | | |
773 | 759 | | |
774 | 760 | | |
775 | | - | |
| 761 | + | |
776 | 762 | | |
777 | 763 | | |
778 | 764 | | |
| |||
796 | 782 | | |
797 | 783 | | |
798 | 784 | | |
799 | | - | |
800 | | - | |
801 | | - | |
| 785 | + | |
802 | 786 | | |
803 | | - | |
| 787 | + | |
804 | 788 | | |
805 | 789 | | |
806 | 790 | | |
| |||
814 | 798 | | |
815 | 799 | | |
816 | 800 | | |
817 | | - | |
| 801 | + | |
818 | 802 | | |
819 | 803 | | |
820 | 804 | | |
| |||
825 | 809 | | |
826 | 810 | | |
827 | 811 | | |
828 | | - | |
829 | | - | |
830 | | - | |
| 812 | + | |
831 | 813 | | |
832 | | - | |
833 | | - | |
| 814 | + | |
834 | 815 | | |
835 | 816 | | |
836 | | - | |
| 817 | + | |
837 | 818 | | |
838 | 819 | | |
839 | 820 | | |
| |||
844 | 825 | | |
845 | 826 | | |
846 | 827 | | |
847 | | - | |
848 | | - | |
849 | | - | |
| 828 | + | |
850 | 829 | | |
851 | 830 | | |
852 | 831 | | |
853 | | - | |
854 | | - | |
| 832 | + | |
855 | 833 | | |
856 | 834 | | |
857 | | - | |
| 835 | + | |
858 | 836 | | |
859 | 837 | | |
860 | | - | |
| 838 | + | |
861 | 839 | | |
862 | 840 | | |
863 | 841 | | |
| |||
871 | 849 | | |
872 | 850 | | |
873 | 851 | | |
874 | | - | |
| 852 | + | |
875 | 853 | | |
876 | 854 | | |
877 | 855 | | |
878 | 856 | | |
879 | 857 | | |
880 | 858 | | |
881 | | - | |
882 | | - | |
883 | | - | |
884 | | - | |
885 | | - | |
886 | | - | |
887 | | - | |
888 | | - | |
889 | | - | |
890 | | - | |
891 | | - | |
892 | | - | |
893 | | - | |
894 | | - | |
895 | | - | |
896 | | - | |
897 | | - | |
898 | | - | |
899 | | - | |
900 | | - | |
901 | | - | |
902 | | - | |
903 | | - | |
904 | | - | |
905 | | - | |
906 | | - | |
907 | | - | |
908 | | - | |
909 | | - | |
910 | | - | |
911 | | - | |
912 | | - | |
Lines changed: 12 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
| 20 | + | |
25 | 21 | | |
26 | 22 | | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
50 | 57 | | |
0 commit comments