Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

array_intersect bug #4

Closed
melin opened this issue Jun 2, 2018 · 6 comments
Closed

array_intersect bug #4

melin opened this issue Jun 2, 2018 · 6 comments

Comments

@melin
Copy link

melin commented Jun 2, 2018

array_intersect 有数组越界bug,代码也复杂,简化代码:
` @OverRide
public Object evaluate(DeferredObject[] arguments) throws HiveException {
Object leftArray = arguments[0].get();
Object rightArray = arguments[1].get();

    int leftArrayLength = leftArrayOI.getListLength(leftArray);
    int rightArrayLength = rightArrayOI.getListLength(rightArray);

    // Check if array is null or empty
    if (leftArray == null || rightArray == null || leftArrayLength < 0 || rightArrayLength < 0) {
        return null;
    }

    if (leftArrayLength == 0) {
        return leftArray;
    }
    if (rightArrayLength == 0) {
        return rightArray;
    }

    List<?> leftList = leftArrayOI.getList(leftArray);
    List<?> rightList = leftArrayOI.getList(rightArray);
    HashSet<?> result_set = Sets.newHashSet(leftList);
    result_set.retainAll(rightList);

    return new ArrayList(result_set);
}`
@melin melin changed the title array_intersect array_intersect bug Jun 3, 2018
@aaronshan
Copy link
Owner

@melin 能给下触发数组越界bug的测试用例么?谢谢

@melin
Copy link
Author

melin commented Jun 5, 2018

数据量比较大,没有确定是那条数据,就换掉实现代码了

minhoryang added a commit to minhoryang/hive-third-functions that referenced this issue Jun 26, 2018
minhoryang added a commit to minhoryang/hive-third-functions that referenced this issue Jul 10, 2018
@minhoryang
Copy link

minhoryang commented Jul 10, 2018

There are 2 bugs:

  1. If the duplicate value is located at the end of first array and the same value is stored multiple times in second array, leftPositions[++leftCurrentPosition] raises ArrayIndexOutOfBoundsException.

select array_intersect(array(0,1,2,3,4,5), array(1,1,2,2,5,5))
select array_intersect(array(0,1,2,3,4,4), array(1,1,2,2,3,4))

java.lang.ArrayIndexOutOfBoundsException: 
      at cc.shanruifeng.functions.array.UDFArrayIntersect.evaluate(UDFArrayIntersect.java:147)

You can test this with INITIAL_SIZE = 1 (just set it smaller than input length)

  1. Value of leftArrayElementTmp2 and rightArrayElementTmp2 never changed even position changes. So ignoring rest of the values without comparing.

select array_intersect(array(0,1,2,3,4,5), array(1,1,2,2,5,5))
expected: <[1, 2, 5]>, but was: <[1]>

minhoryang added a commit to minhoryang/hive-third-functions that referenced this issue Jul 10, 2018
minhoryang added a commit to minhoryang/hive-third-functions that referenced this issue Jul 12, 2018
aaronshan added a commit that referenced this issue Jul 18, 2018
@aaronshan
Copy link
Owner

@minhoryang Thank you very much! It had been resolved!

@cceazj
Copy link

cceazj commented Jul 10, 2023

https://github.com/aaronshan/hive-third-functions/blame/f98fef86d328882c85ea40b69b14375d90d44201/src/main/java/com/github/aaronshan/functions/array/UDFArrayIntersect.java#L160
代码compare方法逻辑错误:位置数组写死了leftPositions。参数中应增加int[] positions,传递进来leftPositions或者rightPositions,相应方法调用处也需一并修改

@aaronshan
Copy link
Owner

https://github.com/aaronshan/hive-third-functions/blame/f98fef86d328882c85ea40b69b14375d90d44201/src/main/java/com/github/aaronshan/functions/array/UDFArrayIntersect.java#L160 代码compare方法逻辑错误:位置数组写死了leftPositions。参数中应增加int[] positions,传递进来leftPositions或者rightPositions,相应方法调用处也需一并修改

@cceazj 能帮忙修复并提个PR吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants