Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Revert "Numpy-compatible concatenate upstream" #15951

Closed
wants to merge 1 commit into from

Conversation

marcoabreu
Copy link
Contributor

Reverts #15894

This PR increased the CI timeout. I assume that it did not pass without increasing the limit, thus I'm reverting the PR with the request to improve the test duration.

@reminisce
Copy link
Contributor

@marcoabreu Could you elaborate what urges to revert the PR? I am expecting more time duration since we have been adding a lot of operators and unit tests. Thanks.

@haojin2
Copy link
Contributor

haojin2 commented Aug 20, 2019

I don't think that's a reasonable assumption. Firstly this PR passed CI several times already (since I rebase with master daily) until some day it gets timeout all the time. I remember for some reason there was a unit test in test_random (probably test_shuffle if I remember correctly) that was taking a very long time so CI was stuck on it. Obviously that was not related with my PR, which was the reason why I'm increasing the CI time limit since I want to unblock several summer interns on their tasks. On the other hand, the only unit test introduced in my PR only takes less than 3 seconds, so I don't think my PR is the cause for CI time regression:

nosetests tests/python/unittest/test_numpy_op.py:test_np_concat
.
----------------------------------------------------------------------
Ran 1 test in 2.998s

OK

@larroy
Copy link
Contributor

larroy commented Aug 20, 2019

This PR has an empty patch as they were several empty commits. @marcoabreu I think you should close this one.

@larroy
Copy link
Contributor

larroy commented Aug 20, 2019

@haojin2 increase on CI limit should be discussed first and be done in a separate PR. We have gone from 1:20 to 3h, we should find the root cause and not just increase the limit without bounds.

@haojin2
Copy link
Contributor

haojin2 commented Aug 20, 2019

@larroy So if you're really that actively in charge of CI stuff then why don't you root cause it while this very PR you guys are trying to revert only adds ~3 extra seconds to the whole unit test? Previous CI time limit increases were not caused by this commit, so simply reverting this PR does not help us find the root cause, and only blocks more people who depend on it (not to mention most of them are summer interns who are about to leave and desperately want to submit their PRs).

On the other hand, why were the previous increases of CI timeout reasonable while mine is not? I actually expect this kind of increase to happen as we add more things to MXNet, each of which requires one or more unit tests (which is the right thing to do). If you guys did already find a unit test that costs more than 40 mins, please deal with it instead of targeting at my PR.

Actually, I'm already aware of time costs of unit tests, and every single one of my unit tests try to reuse the generated mock test data while covering the minimal necessary test cases. Most of the tests in my PRs, including the one you guys are trying to revert here, cost less than 5 seconds to finish. I don't think I'm the right one to be blamed on the status quo here, neither should my commit be.

@marcoabreu
Copy link
Contributor Author

Hi Hao,

this is not a personal affront towards you or your PR and I understand your frustration. The maximum time acts as a forcing function since within the past 9 months, the test duration more than doubled. Thus, increasing the timeout is not an option since it worsens the contributor experience further and further - of course I understand that running into timeouts also does. But that's exactly the forcing function we'd like to achieve until the community improves the situation.

So it's quite unfortunate that it did hit your PR and that I only caught it now after the PR was merged.

@larroy
Copy link
Contributor

larroy commented Aug 20, 2019

@haojin2 I just helped Marco revert the PR since due the way it was merged it was not trivial, please don't kill the messenger. Increasing CI timeout should be at least done in a separate PR.

@haojin2
Copy link
Contributor

haojin2 commented Aug 20, 2019

@marcoabreu Maybe let's take a look at one unix-cpu build triggered 15 hours ago based on the latest master (cause I performed rebase).
BlueOcean shows that the whole test finished using about 2hr55min (< 180 mins obviously), now you can see that the CI infra itself could also has some ups and downs too, cause 6 days ago we sometimes even had difficulties getting CI machines up (which was why this increase was performed). With those being said, I think of your timeout more of a safe net, which means you should also account for the flaky CI infra performance when you set it.
Again as I said the total time for CI is only going to increase, cause we're only adding more to it. Putting a hard limit on the total time does not work and make sense, I think you should pay more attention to the time consumed by a single test, so that the "40-minute long unit test" you mentioned somewhere else could be detected and rejected by CI, not a 3-second one.

@haojin2
Copy link
Contributor

haojin2 commented Aug 20, 2019

@larroy Please don't avoid my question, in #15952 you claimed that I shall tag you on any changes to files under CI folder, so are you actively working on the CI? Would you help with root causing the CI issues? If you have increased the CI timeout once before from 2:00 to 3:00 because you "restarted 4 PRs on the issue", have you investigated the root cause back then? TBH I've restarted more than 4 PRs on this issue, does that justify my change here then? Please provide answers to those questions, thanks.

@larroy
Copy link
Contributor

larroy commented Aug 20, 2019

@haojin2 I think you are confusing me with @marcoabreu. I didn't make the change you mention. My handle is @larroy. Please followup with @marcoabreu on this topic or through the dev@ list. Thanks.

@larroy
Copy link
Contributor

larroy commented Aug 20, 2019

@marcoabreu I think you can close this empty PR, since the conversation here is not constructive and there's no code change.

@haojin2
Copy link
Contributor

haojin2 commented Aug 21, 2019

@larroy I think the author of #13818 is @larroy , isn't it?
I was quoting your comment: #13818 (comment)

@larroy
Copy link
Contributor

larroy commented Aug 21, 2019

Hi @haojin2, this was done 8 months ago as a mitigation to CI failures, the root causes are known, long running unit tests, integration tests that are running for a long time as unit tests. I suggest to discuss this in the mailing list to align the community to address the root cause of these issues as they concern the project as a whole. As you can also see, the change is neatly split in a separate PR.
Thanks.

@marcoabreu marcoabreu closed this Aug 22, 2019
@szha szha deleted the revert-15894-np_concatenate_master branch September 8, 2019 03:39
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants