Skip to content

Commit

Permalink
2.2.0 (#5)
Browse files Browse the repository at this point in the history
* add some new functions.
  • Loading branch information
aaronshan committed Feb 2, 2019
1 parent cd0cf6b commit d9fca5a
Show file tree
Hide file tree
Showing 115 changed files with 2,315 additions and 254 deletions.
2 changes: 0 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,5 @@ install:
- mvn install:install-file -DgroupId=javax.jdo -DartifactId=jdo2-api -Dversion=2.3-ec -Dpackaging=jar -Dfile=$HOME/jdo2-api-2.3-ec.jar

script:
- jdk_switcher use openjdk7
- mvn clean package
- jdk_switcher use oraclejdk8
- mvn clean package
150 changes: 94 additions & 56 deletions README-zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@
hive-third-functions 包含了一些很有用的hive udf函数,特别是数组和json函数.

> 注意:
> hive-third-functions支持hive-0.11.0或更高版本.
> 1. hive-third-functions支持hive-0.11.0或更高版本.
> 2. 运行`3.0.0`及以上版本需要Java8及以上
## 编译

Expand Down Expand Up @@ -40,7 +41,7 @@ mvn clean package -DskipTests

你也可以直接在发布页下载打包好了最新版本 [发布页](https://github.com/aaronshan/hive-third-functions/releases).

> 当前最新的版本是 `2.1.3`
> 当前最新的版本是 `3.0.0`
## 函数

Expand Down Expand Up @@ -71,6 +72,10 @@ mvn clean package -DskipTests
|array_value_count(array<E>, E) -> long | 统计数组中包含给定元素的个数.|
|array_slice(array, start, length) -> array | 对数组进行分片操作,start为正数从前开始分片, start为负数从后开始分片, 长度为指定的长度.|
|array_element_at(array&lt;E&gt;, index) -> E | 返回指定位置的数组元素. 如果索引位置 < 0, 则从尾部开始计数并返回.|
|array_shuffle(array) -> array | 对数组shuffle.|
|sequence(start, end) -> array<Long> | 生成数组序列.|
|sequence(start, end, step) -> array<Long> | 生成数组序列.|
|sequence(start_date_string, end_data_string, step) -> array<String> | 生成日期数组序列.|

### 3. map函数
| 函数| 描述 |
Expand Down Expand Up @@ -145,66 +150,90 @@ mvn clean package -DskipTests
|url_encode(value) -> string | escapes value by encoding it so that it can be safely included in URL query parameter names and values|
|url_decode(value) -> string | unescape the URL encoded value. This function is the inverse of `url_encode`. |

### 10. 数学函数

| function| description |
|:--|:--|
|infinity() -> double | 获取正无穷常数|
|is_finite(x) -> boolean | 判断x是否为有限数值|
|is_infinite(x) -> boolean |判断x是否为无穷数值|
|is_nan(x) -> boolean | 判断x是否不是一个数值类型的变量|
|nan() -> double | 获取一个表示NAN(not-a-number)的常数 |
|from_base(string, radix) -> bigint | 获取字面量的值,该值的基数为radix|
|to_base(x, radix) -> varchar | 返回x以radix为基数的字面量|
|cosine_similarity(x, y) -> double | 返回两个稀疏向量的余弦相似度|


## 用法

将下面这些内容写入 `${HOME}/.hiverc` 文件, 或者也可以按需在hive命令行环境中执行.

```
add jar ${jar_location_dir}/hive-third-functions-${version}-shaded.jar
create temporary function array_contains as 'cc.shanruifeng.functions.array.UDFArrayContains';
create temporary function array_equals as 'cc.shanruifeng.functions.array.UDFArrayEquals';
create temporary function array_intersect as 'cc.shanruifeng.functions.array.UDFArrayIntersect';
create temporary function array_max as 'cc.shanruifeng.functions.array.UDFArrayMax';
create temporary function array_min as 'cc.shanruifeng.functions.array.UDFArrayMin';
create temporary function array_join as 'cc.shanruifeng.functions.array.UDFArrayJoin';
create temporary function array_distinct as 'cc.shanruifeng.functions.array.UDFArrayDistinct';
create temporary function array_position as 'cc.shanruifeng.functions.array.UDFArrayPosition';
create temporary function array_remove as 'cc.shanruifeng.functions.array.UDFArrayRemove';
create temporary function array_reverse as 'cc.shanruifeng.functions.array.UDFArrayReverse';
create temporary function array_sort as 'cc.shanruifeng.functions.array.UDFArraySort';
create temporary function array_concat as 'cc.shanruifeng.functions.array.UDFArrayConcat';
create temporary function array_value_count as 'cc.shanruifeng.functions.array.UDFArrayValueCount';
create temporary function array_slice as 'cc.shanruifeng.functions.array.UDFArraySlice';
create temporary function array_element_at as 'cc.shanruifeng.functions.array.UDFArrayElementAt';
create temporary function bit_count as 'cc.shanruifeng.functions.bitwise.UDFBitCount';
create temporary function bitwise_and as 'cc.shanruifeng.functions.bitwise.UDFBitwiseAnd';
create temporary function bitwise_not as 'cc.shanruifeng.functions.bitwise.UDFBitwiseNot';
create temporary function bitwise_or as 'cc.shanruifeng.functions.bitwise.UDFBitwiseOr';
create temporary function bitwise_xor as 'cc.shanruifeng.functions.bitwise.UDFBitwiseXor';
create temporary function map_build as 'cc.shanruifeng.functions.map.UDFMapBuild';
create temporary function map_concat as 'cc.shanruifeng.functions.map.UDFMapConcat';
create temporary function map_element_at as 'cc.shanruifeng.functions.map.UDFMapElementAt';
create temporary function map_equals as 'cc.shanruifeng.functions.map.UDFMapEquals';
create temporary function day_of_week as 'cc.shanruifeng.functions.date.UDFDayOfWeek';
create temporary function day_of_year as 'cc.shanruifeng.functions.date.UDFDayOfYear';
create temporary function type_of_day as 'cc.shanruifeng.functions.date.UDFTypeOfDay';
create temporary function zodiac_cn as 'cc.shanruifeng.functions.date.UDFZodiacSignCn';
create temporary function zodiac_en as 'cc.shanruifeng.functions.date.UDFZodiacSignEn';
create temporary function pinyin as 'cc.shanruifeng.functions.string.UDFChineseToPinYin';
create temporary function md5 as 'cc.shanruifeng.functions.string.UDFMd5';
create temporary function sha256 as 'cc.shanruifeng.functions.string.UDFSha256';
create temporary function json_array_get as 'cc.shanruifeng.functions.json.UDFJsonArrayGet';
create temporary function json_array_length as 'cc.shanruifeng.functions.json.UDFJsonArrayLength';
create temporary function json_array_extract as 'cc.shanruifeng.functions.json.UDFJsonArrayExtract';
create temporary function json_array_extract_scalar as 'cc.shanruifeng.functions.json.UDFJsonArrayExtractScalar';
create temporary function json_extract as 'cc.shanruifeng.functions.json.UDFJsonExtract';
create temporary function json_extract_scalar as 'cc.shanruifeng.functions.json.UDFJsonExtractScalar';
create temporary function json_size as 'cc.shanruifeng.functions.json.UDFJsonSize';
create temporary function id_card_province as 'cc.shanruifeng.functions.card.UDFChinaIdCardProvince';
create temporary function id_card_city as 'cc.shanruifeng.functions.card.UDFChinaIdCardCity';
create temporary function id_card_area as 'cc.shanruifeng.functions.card.UDFChinaIdCardArea';
create temporary function id_card_birthday as 'cc.shanruifeng.functions.card.UDFChinaIdCardBirthday';
create temporary function id_card_gender as 'cc.shanruifeng.functions.card.UDFChinaIdCardGender';
create temporary function is_valid_id_card as 'cc.shanruifeng.functions.card.UDFChinaIdCardValid';
create temporary function id_card_info as 'cc.shanruifeng.functions.card.UDFChinaIdCardInfo';
create temporary function wgs_distance as 'cc.shanruifeng.functions.geo.UDFGeoWgsDistance';
create temporary function gcj_to_bd as 'cc.shanruifeng.functions.geo.UDFGeoGcjToBd';
create temporary function bd_to_gcj as 'cc.shanruifeng.functions.geo.UDFGeoBdToGcj';
create temporary function wgs_to_gcj as 'cc.shanruifeng.functions.geo.UDFGeoWgsToGcj';
create temporary function gcj_to_wgs as 'cc.shanruifeng.functions.geo.UDFGeoGcjToWgs';
create temporary function gcj_extract_wgs as 'cc.shanruifeng.functions.geo.UDFGeoGcjExtractWgs';
create temporary function url_encode as 'cc.shanruifeng.functions.url.UDFUrlEncode';
create temporary function url_decode as 'cc.shanruifeng.functions.url.UDFUrlDecode';
create temporary function array_contains as 'com.github.aaronshan.functions.array.UDFArrayContains';
create temporary function array_equals as 'com.github.aaronshan.functions.array.UDFArrayEquals';
create temporary function array_intersect as 'com.github.aaronshan.functions.array.UDFArrayIntersect';
create temporary function array_max as 'com.github.aaronshan.functions.array.UDFArrayMax';
create temporary function array_min as 'com.github.aaronshan.functions.array.UDFArrayMin';
create temporary function array_join as 'com.github.aaronshan.functions.array.UDFArrayJoin';
create temporary function array_distinct as 'com.github.aaronshan.functions.array.UDFArrayDistinct';
create temporary function array_position as 'com.github.aaronshan.functions.array.UDFArrayPosition';
create temporary function array_remove as 'com.github.aaronshan.functions.array.UDFArrayRemove';
create temporary function array_reverse as 'com.github.aaronshan.functions.array.UDFArrayReverse';
create temporary function array_sort as 'com.github.aaronshan.functions.array.UDFArraySort';
create temporary function array_concat as 'com.github.aaronshan.functions.array.UDFArrayConcat';
create temporary function array_value_count as 'com.github.aaronshan.functions.array.UDFArrayValueCount';
create temporary function array_slice as 'com.github.aaronshan.functions.array.UDFArraySlice';
create temporary function array_element_at as 'com.github.aaronshan.functions.array.UDFArrayElementAt';
create temporary function bit_count as 'com.github.aaronshan.functions.bitwise.UDFBitCount';
create temporary function bitwise_and as 'com.github.aaronshan.functions.bitwise.UDFBitwiseAnd';
create temporary function array_shuffle as 'com.github.aaronshan.functions.array.UDFArrayShuffle';
create temporary function sequence as 'com.github.aaronshan.functions.array.UDFSequence';
create temporary function bitwise_not as 'com.github.aaronshan.functions.bitwise.UDFBitwiseNot';
create temporary function bitwise_or as 'com.github.aaronshan.functions.bitwise.UDFBitwiseOr';
create temporary function bitwise_xor as 'com.github.aaronshan.functions.bitwise.UDFBitwiseXor';
create temporary function map_build as 'com.github.aaronshan.functions.map.UDFMapBuild';
create temporary function map_concat as 'com.github.aaronshan.functions.map.UDFMapConcat';
create temporary function map_element_at as 'com.github.aaronshan.functions.map.UDFMapElementAt';
create temporary function map_equals as 'com.github.aaronshan.functions.map.UDFMapEquals';
create temporary function day_of_week as 'com.github.aaronshan.functions.date.UDFDayOfWeek';
create temporary function day_of_year as 'com.github.aaronshan.functions.date.UDFDayOfYear';
create temporary function type_of_day as 'com.github.aaronshan.functions.date.UDFTypeOfDay';
create temporary function zodiac_cn as 'com.github.aaronshan.functions.date.UDFZodiacSignCn';
create temporary function zodiac_en as 'com.github.aaronshan.functions.date.UDFZodiacSignEn';
create temporary function pinyin as 'com.github.aaronshan.functions.string.UDFChineseToPinYin';
create temporary function md5 as 'com.github.aaronshan.functions.string.UDFMd5';
create temporary function sha256 as 'com.github.aaronshan.functions.string.UDFSha256';
create temporary function json_array_get as 'com.github.aaronshan.functions.json.UDFJsonArrayGet';
create temporary function json_array_length as 'com.github.aaronshan.functions.json.UDFJsonArrayLength';
create temporary function json_array_extract as 'com.github.aaronshan.functions.json.UDFJsonArrayExtract';
create temporary function json_array_extract_scalar as 'com.github.aaronshan.functions.json.UDFJsonArrayExtractScalar';
create temporary function json_extract as 'com.github.aaronshan.functions.json.UDFJsonExtract';
create temporary function json_extract_scalar as 'com.github.aaronshan.functions.json.UDFJsonExtractScalar';
create temporary function json_size as 'com.github.aaronshan.functions.json.UDFJsonSize';
create temporary function id_card_province as 'com.github.aaronshan.functions.card.UDFChinaIdCardProvince';
create temporary function id_card_city as 'com.github.aaronshan.functions.card.UDFChinaIdCardCity';
create temporary function id_card_area as 'com.github.aaronshan.functions.card.UDFChinaIdCardArea';
create temporary function id_card_birthday as 'com.github.aaronshan.functions.card.UDFChinaIdCardBirthday';
create temporary function id_card_gender as 'com.github.aaronshan.functions.card.UDFChinaIdCardGender';
create temporary function is_valid_id_card as 'com.github.aaronshan.functions.card.UDFChinaIdCardValid';
create temporary function id_card_info as 'com.github.aaronshan.functions.card.UDFChinaIdCardInfo';
create temporary function wgs_distance as 'com.github.aaronshan.functions.geo.UDFGeoWgsDistance';
create temporary function gcj_to_bd as 'com.github.aaronshan.functions.geo.UDFGeoGcjToBd';
create temporary function bd_to_gcj as 'com.github.aaronshan.functions.geo.UDFGeoBdToGcj';
create temporary function wgs_to_gcj as 'com.github.aaronshan.functions.geo.UDFGeoWgsToGcj';
create temporary function gcj_to_wgs as 'com.github.aaronshan.functions.geo.UDFGeoGcjToWgs';
create temporary function gcj_extract_wgs as 'com.github.aaronshan.functions.geo.UDFGeoGcjExtractWgs';
create temporary function url_encode as 'com.github.aaronshan.functions.url.UDFUrlEncode';
create temporary function url_decode as 'com.github.aaronshan.functions.url.UDFUrlDecode';
create temporary function infinity as 'com.github.aaronshan.functions.math.UDFMathInfinity';
create temporary function is_finite as 'com.github.aaronshan.functions.math.UDFMathIsFinite';
create temporary function is_infinite as 'com.github.aaronshan.functions.math.UDFMathIsInfinite';
create temporary function is_nan as 'com.github.aaronshan.functions.math.UDFMathIsNaN';
create temporary function nan as 'com.github.aaronshan.functions.math.UDFMathIsNaN';
create temporary function from_base as 'com.github.aaronshan.functions.math.UDFMathFromBase';
create temporary function to_base as 'com.github.aaronshan.functions.math.UDFMathToBase';
create temporary function cosine_similarity as 'com.github.aaronshan.functions.math.UDFMathCosineSimilarity';
```

你可以在hive的命令杭中使用下面的语句来查看函数的细节.
Expand Down Expand Up @@ -257,6 +286,11 @@ select array_concat(array(16,12,18,9,null), array(14,9,6,18,null)) => [16,12,18,
select array_value_count(array(16,13,12,13,18,16,9,18), 13) => 2
select array_slice(array(16,13,12,13,18,16,9,18), -2, 3) => [9,18]
select array_element_at(array(16,13,12,13,18,16,9,18), -1) => 18
select array_shuffle(array(16,12,18,9))
select sequence(1, 5) => [1, 2, 3, 4, 5]
select sequence(5, 1) => [5, 4, 3, 2, 1]
select sequence(1, 9, 4) => [1, 5, 9]
select sequence('2016-04-12 00:00:00', '2016-04-14 00:00:00', 24*3600*1000) => ['2016-04-12 00:00:00', '2016-04-13 00:00:00', '2016-04-14 00:00:00']
```

```
Expand Down Expand Up @@ -302,3 +336,7 @@ select gcj_extract_wgs(39.915, 116.404) => {"lng":116.39775549316407,"lat":39.91
```
select url_encode('http://shanruifeng.cc/') => http%3A%2F%2Fshanruifeng.cc%2F
```

```
select cosine_similarity(map_build(array['a'], array[1.0]), map_build(array['a'], array[2.0])); => 1.0
```
Loading

0 comments on commit d9fca5a

Please sign in to comment.