Skip to content

Conversation

@GuoPhilipse
Copy link
Member

@GuoPhilipse GuoPhilipse commented Jul 6, 2020

What changes were proposed in this pull request?

update REGEXP usage and examples in sql-ref-syntx-qry-select-like.cmd

Why are the changes needed?

make the usage of REGEXP known to more users

Does this PR introduce any user-facing change?

No

How was this patch tested?

No tests

@maropu
Copy link
Member

maropu commented Jul 6, 2020

I think you don't need to file jira for this kind of minor doc fixes. Btw, any other systems supporting REGEXP for regular expressions other than Hive? I think we might be able to assign an alias name to it if it is a general name.

* `ACOS(n)` If n < -1 or n > 1, Hive returns null, Spark SQL returns NaN.
* `ASIN(n)` If n < -1 or n > 1, Hive returns null, Spark SQL returns NaN.
* `CAST(n AS TIMESTAMP)` If n is integral numbers, Hive treats n as milliseconds, Spark SQL treats n as seconds.
* `REGEXP(str, patten)` Hive support this function, Spark SQL use RLIKE instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: support -> supports and use -> uses

@maropu
Copy link
Member

maropu commented Jul 6, 2020

cc: @HyukjinKwon

@HyukjinKwon HyukjinKwon changed the title [SPARK-32193][SQL]update migrate guide docs on regexp function [SPARK-32193][SQL] Update migrate guide docs on regexp function Jul 6, 2020
@GuoPhilipse
Copy link
Member Author

GuoPhilipse commented Jul 6, 2020

I think you don't need to file jira for this kind of minor doc fixes. Btw, any other systems supporting REGEXP for regular expressions other than Hive? I think we might be able to assign an alias name to it if it is a general name.

So far as i know ,mysql and hive support this key word .But i think a alias name seems a good idea.
mysql examples:
SELECT 'abc' REGEXP '([a-z]+)';
Ressult:
1

@HyukjinKwon
Copy link
Member

Thanks @maropu. @GuoPhilipse, does MySQL also supports REGEXP as a function?

From reading the doc in Hive at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF, it's same as Rlike which Spark supports. Maybe we should alias it.

FWIW, there are many unsupported expressions explicitly:

// List of functions we are explicitly not supporting are:
// compute_stats, context_ngrams, create_union,
// current_user, ewah_bitmap, ewah_bitmap_and, ewah_bitmap_empty, ewah_bitmap_or, field,
// in_file, index, matchpath, ngrams, noop, noopstreaming, noopwithmap,

* `ACOS(n)` If n < -1 or n > 1, Hive returns null, Spark SQL returns NaN.
* `ASIN(n)` If n < -1 or n > 1, Hive returns null, Spark SQL returns NaN.
* `CAST(n AS TIMESTAMP)` If n is integral numbers, Hive treats n as milliseconds, Spark SQL treats n as seconds.
* `REGEXP(str, patten)` Hive support this function, Spark SQL use RLIKE instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, technically this isn's in "the scenarios in which Hive and Spark generate different results".

@GuoPhilipse
Copy link
Member Author

Thanks @maropu. @GuoPhilipse, does MySQL also supports REGEXP as a function?

From reading the doc in Hive at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF, it's same as Rlike which Spark supports. Maybe we should alias it.

FWIW, there are many unsupported expressions explicitly:

// List of functions we are explicitly not supporting are:
// compute_stats, context_ngrams, create_union,
// current_user, ewah_bitmap, ewah_bitmap_and, ewah_bitmap_empty, ewah_bitmap_or, field,
// in_file, index, matchpath, ngrams, noop, noopstreaming, noopwithmap,

yes, i posted the REGEXP usage in mysql last comment.

@HyukjinKwon
Copy link
Member

No, @GuoPhilipse, I meant if it's supported in SELECT REGEXP('abc', '([a-z]+)'); way.

@maropu
Copy link
Member

maropu commented Jul 6, 2020

I get a bit confiused and Hive really supports REGEX as a function?

// REGEXP case in hive (3.1.1)
hive> SELECT 'abc' REGEXP '([a-z]+)';
OK
true

hive> select REGEXP('abc','([a-z]+)');
NoViableAltException(251@[])
	at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:962)
    ...
FAILED: ParseException line 1:7 cannot recognize input near 'REGEXP' '(' ''abc'' in select clause

// RLIKE in hive
hive> SELECT 'abc' RLIKE '([a-z]+)';
OK
true

hive> select RLIKE('abc','([a-z]+)');
NoViableAltException(265@[])
	at org.apache.hadoop.hive.ql.parse.HiveParser_SelectClauseParser.selectClause(HiveParser_SelectClauseParser.java:962)
	...
FAILED: ParseException line 1:7 cannot recognize input near 'RLIKE' '(' ''abc'' in select clause

// REGEXP in mysql
mysql> SELECT 'abc' REGEXP '([a-z]+)';
+-------------------------+
| 'abc' REGEXP '([a-z]+)' |
+-------------------------+
|                       1 |
+-------------------------+

mysql> select REGEXP('abc','([a-z]+)');
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'REGEXP('abc','([a-z]+)')' at line 1

// RLIKE in mysql
mysql> SELECT 'abc' RLIKE '([a-z]+)';
+------------------------+
| 'abc' RLIKE '([a-z]+)' |
+------------------------+
|                      1 |
+------------------------+

mysql> select RLIKE('abc','([a-z]+)');
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'RLIKE('abc','([a-z]+)')' at line 1

@GuoPhilipse
Copy link
Member Author

GuoPhilipse commented Jul 6, 2020

I tried in hive 1.1, and it works. seems hive is incompatible for old version

//RLIKE
hive> select RLIKE('abc','([a-z]+)');
true

hive> select 'abc' RLIKE'([a-z]+)';
true

//REGEXP
hive> select REGEXP('abc','([a-z]+)');
true

hive> select 'abc' REGEXP'([a-z]+)';
true

@maropu
Copy link
Member

maropu commented Jul 6, 2020

Yea, I checked the doc @HyukjinKwon put above, and it seems the current hive only supports REGEXP only in a SQL syntax.

@GuoPhilipse GuoPhilipse changed the title [SPARK-32193][SQL] Update migrate guide docs on regexp function [SPARK-32193][SQL] Add alias for function rlike Jul 9, 2020

```sql
[ NOT ] { LIKE search_pattern [ ESCAPE esc_char ] | RLIKE regex_pattern }
[ NOT ] { LIKE search_pattern [ ESCAPE esc_char ] | RLIKE regex_pattern | REGEXP regex_pattern}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ RLIKE | REGEXP ] regex_pattern cc: @huaxingao

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its okay for this PR to just update this doc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have updated and added examples for this usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, this reminded me that there are still missing keywords in the SQL docs: https://issues.apache.org/jira/browse/SPARK-31753
Its very helpful if you take this over.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am glad to take this over,will work on it later :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! That's very helpful.


- In Spark 3.1, `from_unixtime`, `unix_timestamp`,`to_unix_timestamp`, `to_timestamp` and `to_date` will fail if the specified datetime pattern is invalid. In Spark 3.0 or earlier, they result `NULL`.

- In Spark 3.1, we can use regexp function, it is the alias of rlike funtion, which functions the same with rike funtion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this.

expression[StringReplace]("replace"),
expression[Overlay]("overlay"),
expression[RLike]("rlike"),
expression[RLike]("regexp", true),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need this? It seems hive and mysql doesn't support this though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, let me will remove the function usage

@GuoPhilipse GuoPhilipse changed the title [SPARK-32193][SQL] Add alias for function rlike [SPARK-32193][DOCS] Update regexp usage in docs Jul 9, 2020
@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Jul 9, 2020

Test build #125427 has finished for PR 29009 at commit 06cfd5d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay but let me leave it to @maropu

@maropu maropu changed the title [SPARK-32193][DOCS] Update regexp usage in docs [SPARK-32193][SQL][DOCS] Update regexp usage in SQL docs Jul 9, 2020
@maropu maropu closed this in 09cc6c5 Jul 9, 2020
maropu pushed a commit that referenced this pull request Jul 9, 2020
### What changes were proposed in this pull request?
update REGEXP usage and examples in sql-ref-syntx-qry-select-like.cmd

### Why are the changes needed?
make the usage of REGEXP known to more users

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

No tests

Closes #29009 from GuoPhilipse/update-migrate-guide.

Lead-authored-by: GuoPhilipse <[email protected]>
Co-authored-by: GuoPhilipse <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
(cherry picked from commit 09cc6c5)
Signed-off-by: Takeshi Yamamuro <[email protected]>
@maropu
Copy link
Member

maropu commented Jul 9, 2020

Thanks! Merged to master/3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants