Skip to content

Conversation

@abstractdog
Copy link
Contributor

No description provided.

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@pjfanning
Copy link
Member

The test issues seem to be related to mockito

java.lang.NoClassDefFoundError: org/mockito/stubbing/Answer
at org.apache.tez.test.TestAM.setup(TestAM.java:66)
Caused by: java.lang.ClassNotFoundException: org.mockito.stubbing.Answer
at org.apache.tez.test.TestAM.setup(TestAM.java:66)

@ayushtkn
Copy link
Member

You are checking the wrong build result, mockito also was a problem, but we upgraded it and the build result got sorted. Check the result here of this test in the latest build:
https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-213/4/testReport/org.apache.tez.history/TestHistoryParser/testParserWithFailedJob/

There is an exception:

java.lang.AbstractMethodError: javax.ws.rs.core.UriBuilder.uri(Ljava/lang/String;)Ljavax/ws/rs/core/UriBuilder;
	at javax.ws.rs.core.UriBuilder.fromUri(UriBuilder.java:96)
	at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:911)

@steveloughran
Copy link

hadoop 3.3.4 cuts jax.rs from the dependency graph

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have now conflicts with the master branch. Can try with 3.3.4, which is released now to see if things work here now

@zhangbutao
Copy link
Contributor

@abstractdog Can we upgrade hadoop to 3.3.4? I have tested hive maser with hadoop3.3.4(apache/hive#3578) , and if i change tez's hadoop dependency to 3.3.4, everything looks ok.

@abstractdog
Copy link
Contributor Author

yes, let me check this PR and have some test coverage

@abstractdog abstractdog changed the title TEZ-4420: Upgrade to Hadoop 3.3.3 TEZ-4420: Upgrade to Hadoop 3.3.4 Sep 8, 2022
@tez-yetus

This comment was marked as outdated.

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We got a clean build, the Jackson stuff changes in hadoop, got us sorted. :-)
LGTM

@zhangbutao
Copy link
Contributor

Great! If this PR is merged into Tez 0.10.2, then we can continue to upgrade hadoop version in hive repo.

@pjfanning
Copy link
Member

Can this be merged to unblock other upgrades?

@abstractdog
Copy link
Contributor Author

Can this be merged to unblock other upgrades?

let me take a look soon
what upgrade is this blocking at the moment? actually, hive is not blocked now AFAIK

@pjfanning
Copy link
Member

It was Hive that I thought was blocked. It would still be nice to get this moved on but I guess if it's not blocking anything then it's not as urgent.

@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@steveloughran
Copy link

steveloughran commented Oct 24, 2022

we are nearing code freeze for 3.3.5 btw; if there is some final blocker to upgrade, now is the time to identify it

@amanraj2520
Copy link
Contributor

Hi @ayushtkn @abstractdog @pjfanning @steveloughran

Since this PR has been stale for a month, just following up to check whether I can help in validating any other test scenarios for the upgrade. Or if not, are we good to merge this PR ?

@amanraj2520
Copy link
Contributor

amanraj2520 commented Nov 21, 2022

Hi @ayushtkn @abstractdog @pjfanning @steveloughran,

I am driving the hive-3.2.0 release in Open Source for which we have decided to upgrade to hadoop 3.3.4. But since Tez 0.10.2 is using hadoop-3.3.1 there can be some integration issues for this stack. I know that we are currently discussing hadoop-3.3.5 release but it would take at least a couple of months to get a stable release candidate for the same which would also require a lot of testing.

This will be a blocker for the 3.2.0 release and since we have started to plan our tasks around this release, I was proposing to make a Tez release 0.10.3 where we can have this hadoop version upgraded to 3.3.4. This would insure that we can start working on the 3.2.0 without any blockers.

Please share your thoughts on the same. I am open to discussing this release further.

Thanks,
Aman.

@ayushtkn
Copy link
Member

I was talking about
apache/hive#3279
here we tried till 3.3.3 and all the unit tests were passing but on actual cluster things weren't working. So, I meant apart from the unit tests, we need to test on actual clusters as well.

The failure was flagged here:
apache/hive#3279 (comment)

@amanraj2520
Copy link
Contributor

Hi @ayushtkn

I am familiar with this error. This happens because Tez 0.10.1 is using Hadoop 3.1.3 which in turn uses jetty 9.3* which does not have this method but when we upgrade to Hadoop 3.3.4 it uses this method. So basically I am saying that if we maintained the same version of Hadoop in Hive and Tez, this would not have been the issue. Having said this, I will make sure that the 0.10.3 is validated on clusters including unit tests Ayush.

@steveloughran
Copy link

3.3.4 is the one where we fixed the tez incompatibilities for you 🙂

i plan to cut the 3.3.5 rc0 this week but it's a test of the rc process (x86 and arm) rather than something we intend to bring to a vote.

it'd be good to test hive and tez with the rc to see if we have caused any regressions, as now will be the time to fix. there's been a lot of dependency updates to stop CVEs, but we've left alone the ones we know cause problems downstream (jackson is on a cve-fixed 2.12.x release)

@amanraj2520
Copy link
Contributor

@steveloughran So do you think having a new Tez release with 3.3.4 should help or not?

@amanraj2520
Copy link
Contributor

@abstractdog @steveloughran Can you please look into this?

@steveloughran
Copy link

3.3.4 should be good; if not try with a 3.3.5 snapshot (build yourself) and see if that fixes things.

i'd suggest going with 3.3.4 if it works, so the 3.3.5 release isn't a blocker. you can do an upgrade after

@abstractdog
Copy link
Contributor Author

I believe from tez->hadoop point of view, we usually rely only on precommit testing (including lots of minicluster tests)
if something is broken from hive -> tez -> hadoop, we should report it separately
I guess we should simply go on with hadoop 3.3.4 now, and resolve this ticket

TestRecovery passed locally, I'm restarting precommit tests

can someone approve this PR? @rbalamohan , @jteagles

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 36m 14s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+0 🆗 mvndep 6m 32s Maven dependency ordering for branch
+1 💚 mvninstall 10m 4s master passed
+1 💚 compile 3m 12s master passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 compile 3m 2s master passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javadoc 3m 9s master passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 2m 26s master passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
_ Patch Compile Tests _
+0 🆗 mvndep 1m 17s Maven dependency ordering for patch
+1 💚 mvninstall 6m 9s the patch passed
+1 💚 compile 4m 7s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javac 4m 7s the patch passed
+1 💚 compile 3m 42s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
+1 💚 javac 3m 42s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 2s The patch has no ill-formed XML file.
+1 💚 javadoc 3m 24s the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04
+1 💚 javadoc 2m 53s the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08
_ Other Tests _
+1 💚 unit 40m 17s tez-tests in the patch passed.
+1 💚 unit 72m 24s root in the patch passed.
+1 💚 asflicense 1m 14s The patch does not generate ASF License warnings.
201m 22s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-213/8/artifact/out/Dockerfile
GITHUB PR #213
JIRA Issue TEZ-4420
Optional Tests dupname asflicense javac javadoc unit xml compile
uname Linux e4562a85b404 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 2fd7df4
Default Java Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-213/8/testReport/
Max. process+thread count 1388 (vs. ulimit of 5500)
modules C: tez-tests . U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-213/8/console
versions git=2.25.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@amanraj2520
Copy link
Contributor

@rbalamohan @jteagles Can you please approve this PR. Also @abstractdog are we going with the tez 0.10.3 release which will have hadoop 3.3.4.

@amanraj2520
Copy link
Contributor

@abstractdog Upgrade to Hadoop 3.3.4 is failing tests in Tez as follows :
[INFO] Running org.apache.tez.common.TestTezCommonUtils [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 13.61 s <<< FAILURE! - in org.apache.tez.common.TestTezCommonUtils [ERROR] org.apache.tez.common.TestTezCommonUtils Time elapsed: 13.607 s <<< ERROR! java.lang.RuntimeException: problem starting mini dfs cluster at org.apache.tez.common.TestTezCommonUtils.setup(TestTezCommonUtils.java:71) Caused by: java.io.IOException: Timed out waiting for Mini HDFS Cluster to start at org.apache.tez.common.TestTezCommonUtils.setup(TestTezCommonUtils.java:66)

It is failing with timed out waiting for Mini DFS Cluster to start.

@steveloughran
Copy link

Any reason in the logs for the cluster not coming up?

@amanraj2520
Copy link
Contributor

@steveloughran Not checked it yet. Will check and let you know

@amanraj2520
Copy link
Contributor

amanraj2520 commented Feb 7, 2023

@steveloughran @abstractdog Apologizes for getting back late. Was involved in Hive test case fixes for branch-3. Seems like this issue was intermittent. Now I see this test passing, tried it 3-4 times. Attached the snippet.
image

We can merge this to master. +1 from my side. Tested it extensively on my local as well as on a local cluster.

@amanraj2520
Copy link
Contributor

Sorry added the wrong snippet for the passing test. Attached below:
image

@amanraj2520
Copy link
Contributor

image

I see these test failures in the tez-tests module. Currently fixing it @abstractdog @steveloughran This issue comes when PowerMock library conflicts with Mockito as described here https://stackoverflow.com/questions/71973762/java-lang-nosuchmethoderror-org-mockito-answers-getlorg-mockito-stubbing-answ

@amanraj2520
Copy link
Contributor

Fixed all the tests now :
image

@abstractdog Can you please cherry-pick this commit I did in my forked branch - 944476e this will fix the tests. I cannot push these changes to your branch.

@amanraj2520
Copy link
Contributor

image

Still there are more failures with further tests. Will fix them

@amanraj2520
Copy link
Contributor

Got the issue :
There was this new commit added to TestLocalMode in TEZ-4447 which basically asserted
assertEquals(VertexStatus.State.SUCCEEDED, dagClient1.getVertexStatus(SleepProcessor.SLEEP_VERTEX_NAME, null).getState());

When I debugged this tests I found this :
image

When it tries to get from the cachedVertexStatus (size : 0) the status of vertexName "sleep" it does not find any entry and therefore returns null. And hence, dagClient1.getVertexStatus(SleepProcessor.SLEEP_VERTEX_NAME, null) returns null and when a getState happens, it throws null pointer exception.

@abstractdog Can you please check this from your end, maybe we need to add some config to have an entry for the Sleep vertex name in cachedVertexStatus

@steveloughran
Copy link

where is the mockito ref coming from? as I don't see why it should be exported -if it is, then that is something to cut back on

@abstractdog
Copy link
Contributor Author

@amanraj2520: thanks for your work so far, if you have the bandwidth to drive this, please open a PR and assign TEZ-4420 to yourself
we'll find a way to cross-reference PRs as there is so much context here

btw: yeah, tez 0.10.3 can contain this change, I think once this is merged and we merge some other fixes, we can go for a release (in a couple of weeks/months)

@amanraj2520
Copy link
Contributor

@abstractdog Sure assigning TEZ-4420 to myself

@abstractdog
Copy link
Contributor Author

closing this as final PR was #272

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants