Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Gandiva Tests are failing due to linking issues #63

Open
vibhatha opened this issue Aug 6, 2024 · 10 comments
Open

[Java] Gandiva Tests are failing due to linking issues #63

vibhatha opened this issue Aug 6, 2024 · 10 comments
Assignees
Labels

Comments

@vibhatha
Copy link
Contributor

vibhatha commented Aug 6, 2024

Describe the bug, including details regarding any error messages, version, and platform.

At the moment, java-jars CI in Arrow Java is failing due to a Gandiva issue. This CI in green is much needed to determine the accuracy of certain PRs to Java. In order to keep it green, as discussed we are disabling the failing Gandiva tests temporarily. But needs to be re-enabled once the LLVM related issue is resolved.

Component(s)

Java

@jinchengchenghh
Copy link
Contributor

apache/arrow#43639 (comment) This test also needs to be disabled.

@lidavidm
Copy link
Member

@praveenbingo @pravindra you seem to be the last committers to have worked on the Java part of Gandiva; do you have any idea about this issue?

@vibhatha vibhatha changed the title [Java] Gandiva Tests are failure related to Gandiva [Java] Gandiva Tests are failing due to linking issues Sep 6, 2024
@vibhatha
Copy link
Contributor Author

vibhatha commented Sep 6, 2024

Most of the failing tests are due to the Could not create LLJIT instance issue [1]. Also, the ProjectTest is mainly failing once. So it would be better to disable the overall test suite and re-enable it once this has been properly fixed.

[1] https://github.com/apache/arrow/actions/runs/10731577229/job/29761887531?pr=43978#step:7:20331

lidavidm referenced this issue in apache/arrow Sep 10, 2024
)

### Rationale for this change

Gandiva tests are failing due to a linking issue and it is failing the Java CIs. But for most of the made PRs, we cannot verify the overall workflow given that those PRs are independent of the Gandiva component. 

### What changes are included in this PR?

This PR disables such failing tests temporarily such that once the Gandiva issue is fixed, re-enabling the tests. 
Re-enabling task will be tracked using https://github.com/apache/arrow/issues/43981

### Are these changes tested?

Yes, by existing CIs and tests.

### Are there any user-facing changes?

No
* GitHub Issue: #43576

Authored-by: Vibhatha Lakmal Abeykoon <[email protected]>
Signed-off-by: David Li <[email protected]>
@lidavidm
Copy link
Member

@laurentgo do you have any colleagues who might know what's up with Gandiva?

khwilson referenced this issue in khwilson/arrow Sep 14, 2024
…che#43978)

### Rationale for this change

Gandiva tests are failing due to a linking issue and it is failing the Java CIs. But for most of the made PRs, we cannot verify the overall workflow given that those PRs are independent of the Gandiva component. 

### What changes are included in this PR?

This PR disables such failing tests temporarily such that once the Gandiva issue is fixed, re-enabling the tests. 
Re-enabling task will be tracked using https://github.com/apache/arrow/issues/43981

### Are these changes tested?

Yes, by existing CIs and tests.

### Are there any user-facing changes?

No
* GitHub Issue: #43576

Authored-by: Vibhatha Lakmal Abeykoon <[email protected]>
Signed-off-by: David Li <[email protected]>
zeroshade referenced this issue in zeroshade/arrow Sep 30, 2024
…che#43978)

### Rationale for this change

Gandiva tests are failing due to a linking issue and it is failing the Java CIs. But for most of the made PRs, we cannot verify the overall workflow given that those PRs are independent of the Gandiva component. 

### What changes are included in this PR?

This PR disables such failing tests temporarily such that once the Gandiva issue is fixed, re-enabling the tests. 
Re-enabling task will be tracked using https://github.com/apache/arrow/issues/43981

### Are these changes tested?

Yes, by existing CIs and tests.

### Are there any user-facing changes?

No
* GitHub Issue: #43576

Authored-by: Vibhatha Lakmal Abeykoon <[email protected]>
Signed-off-by: David Li <[email protected]>
@raulcd
Copy link
Member

raulcd commented Oct 9, 2024

I've moved this to 19.0.0, let me know if this should be a blocker for the release.

@lidavidm
Copy link
Member

lidavidm commented Oct 9, 2024

It's possible we're going to ship Gandiva in a known-broken state, so we should just highlight it in the release notes

@vibhatha
Copy link
Contributor Author

+1 for this. I think that makes sense and a sub optimal solution for the problem we have.

@lriggs
Copy link

lriggs commented Nov 2, 2024

Hi, I work with @laurentgo at Dremio and I've started looking into this issue. Let me know if there is a better format for this discussion (email vs GH issue, etc). I wanted to summarize some of the different threads and then talk about how I've worked around the issue here at Dremio.

The test errors seem to all be "Symbols not found: [ llvm_orc_registerEHFrameSectionWrapper ]" which is also tracked here, and a few causes and ideas for fixes are listed here and here for Python.

I also found this issue regarding the symbols in the llvm project.

The poster there believes that requiring the symbols is a bad design. I tend to agree but I'm not sure why it was done in the first place.

The ORC symbol issue seems to stem from apache/arrow#37848. We haven't had this problem at Dremio because we actually reverted that change in our local repo after I found a few llvm expression related test failures. I asked about that here https://lists.apache.org/thread/xzrwx1t37dc88vo5yo5337l16ypqwcrd. I spent considerable time looking into this issue but was never able to track it down.

I'm not sure what to do at this point. One idea is reverting the JIT upgrade while also following up with the llvm community on the symbol issue. @niyue Mentioned following up with them here, not sure if he found anything: apache/arrow#39695 (comment)

@vibhatha
Copy link
Contributor Author

vibhatha commented Nov 3, 2024

Right. I think we can discuss the technical aspect of the issue here.
Also I started a ML thread to get community support to resolve this issue as this is a bit out of scope for me.

FWIW, I would say it would be fine to get this to a release viable point by using an older version of an upgrade we have done so far. But I solely wouldn't be a judge for that.

Really appreciate your support to look into this matter.

@niyue
Copy link

niyue commented Nov 5, 2024

@lriggs Regarding the LLVM 17 symbol issue, I recall investigating it previously. It stems from a design change introduced in LLVM 17, and unfortunately, there doesn't appear to be an easy fix at that time. A related issue has been logged here: LLVM Project Issue #74671.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants