-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25611][SPARK-25612][SQL][TESTS] Improve test run time of CompressionCodecSuite #22641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@mgaido91 Could you please take a look ? Thank you. |
|
I am not sure about this @dilipbiswal. Taking only one element is a bit too risky as there may be a working combination and a non-working one and here you don't know which one you are picking. I am afraid this could create very instable tests in case of regression, with the serious risk of non-detecting a regression on a run. |
|
@mgaido91 Trying to understand the concern regarding "working combination and a non-working " comment. In my understanding, originally we were doing a cross join between two sets of codecs. so if outer loop has 2 elements and inner has 3 , then we would test 6 combinations. With the current change , in a given run, we would just randomly pick one out of that 6 combination with a hope that another run would pick a different combination out of the 6 possible combination. In case of a failure in this test, we should run the full tests i.e all 6 combination to identify the issue. Please let me know what i could be missing here ? |
|
Test build #96991 has finished for PR 22641 at commit
|
|
@dilipbiswal you're perfectly right. The point is: let's assume we introduce a bug which causes returning always SNAPPY. Your test will pass about 2/3 times out of 6. So it is quite likely that the test automation would fail detecting the issue. In the case you referenced, I was taking 50 out of 600 possible values. 50 is a number quite high anyway, so it enforces well enough that we support multiple combinations. Here the cardinality is much lower, so we are not enforcing that we are generalizing well. |
|
@mgaido91 I took another look at the testcase. Let me outline some of my understandings first.
One thing to note is that, the codecs being tested are not exhaustive and we pick a subset (perhaps the most popular ones). Other thing is that, we have a 3 way loop 1) isPartitioned 2) convertMetastore 3) useCTAS on top of the codec loop. So we will be calling the codec loop 8 times in a test for each unique combination of (isPartitioned, convertMetastore, useCTAS). And we have changed the codec loop to randomly pick one combination of table level and session level codecs. Given this, i feel we are getting a decent coverage and also i feel we should be able to catch regression as we will catch it in some jenkin run or the other. If you still feel uncomfortable, should we take 2 codecs as opposed to 1 ? It will generate a 32 (4 * 8) times loop as opposed to 72 (9 * 8). |
|
-1 We definitely can't just randomly test a subset of cases we need to test in order to make things faster. Worse, it makes the failure nondeterministic. However, if there's a good argument that some combinations that are tested really are superfluous, let's make a change that avoids those extra tests. For example: is there a good argument that testing combinations that cover all options at least once, instead of testing every combination, is probably sufficient? That is, if I want to test combinations of options A, B, C and x, y, then is it probably sufficient here to test, say, (A,x), (B,x), (C,y) rather than all 6 possibilities? |
|
@srowen Thank you for your comments. Actually from a cursory look, i would agree that it does not look that pretty. I also agree that it does look like we are not testing as much as we used to. This testcase was added as part of SPARK-21786 and if we look at what this is trying to test, its basically testing the following the precedence of how compression codec is choosen : Basically what we are trying to test is , if table and session level codecs are specified which one wins ? So we could have just tested this with 1 value of table codec (say snappy) and one value of session codec say (gzip). But we are trying to be extra cautious and testing a cross product of 3 * 3 combination. It seems to me the 3 values that we have chosen are probably the most commonly used ones. So i wanted to preserve this input set .. but decide which combination to test randomly. Also, like i mentioned above, we have a 8 way loop on top , which mean in 1 run, we would probably pick 8 out of 9 combination of codecs. And in so many runs of jenkins, we will eventually test all the combination that we wanted to test in the first place there by catching any regression that occurs. Given the code that we are trying to test, it would be extremely rare that it would work for one codec combination but fails for another as the logic is codec value agnostic but merely a precedence check. However we are taking a hit for every run , by developers who run on their laptop and all the jenkin runs that happens automatically. So we have a few options :
I will do whatever you guys prefer here... Please advice. |
|
I would vote against running tests that we think have any value randomly. It's just the wrong way to solve problems, as much as it would be to simply run 90% of our test suites each time on the theory that eventually we'd catch bugs. If there are, say, 3 codecs, and the point is to test whether one specified codec overrides another, does that really need more than 1 test? is there any reason to believe that override works/doesn't work differently for different codecs? Or are 3 tests sufficient, one to test overriding of each? If not, I'd say do nothing. The maximum win here is about a minute of test time? not worth it. Or: can these cases be parallelized within the test suite? |
OK.
@gatorsmile Would you have some input here. Can we reduce the codec input set, given we are just testing precedence |
|
@srowen @gatorsmile Does this sound reasonable ? |
|
Test build #97071 has finished for PR 22641 at commit
|
|
Test build #97083 has finished for PR 22641 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of thing looks like a much better approach to reducing the runtime while only skipping tests we think are not meaningful.
| tableCompressionCodecs = tableCompressCodecs) { | ||
| case (tableCodec, sessionCodec, realCodec, tableSize) => | ||
| val expectCodec = tableCodec.get | ||
| val expectCodec = if (tableCodec.isDefined) tableCodec.get else sessionCodec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be tableCodec.getOrElse(sessionCodec)
|
Test build #97088 has finished for PR 22641 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this is a solid approach. This is more selective about what's important to run.
|
|
||
| def checkForTableWithCompressProp(format: String, compressCodecs: List[String]): Unit = { | ||
| def checkForTableWithCompressProp( | ||
| format: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nit: I think these should be indented one more space.
|
Test build #97117 has finished for PR 22641 at commit
|
|
I think random tests are not a good solution. If the use case does run too slowly, it may be much better to reduce codecs. We have done unit tests on the codec priority in |
|
@fjh100456 Yeah. I have removed random ness and have reduced the combinations of codecs we test with. |
| format: String, | ||
| tableCompressCodecs: List[String], | ||
| sessionCompressCodecs: List[String]): Unit = { | ||
| Seq(true, false).foreach { isPartitioned => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see, the two tests before were taking 2:20 and 0:47. Now they take about 0:43 each. That's clearly a win in the first case, not much in the second, as expected. I'm OK with this as an improvement, myself.
I wonder if we need all 8 combinations in this triply nested loop though. Incidentally it could be written more easily as such, but I know this was existing code:
for (isPartitioned <- Seq(true, false); convertMetastore <- Seq(true, false); usingCTAS <- Seq(true, false)) {
@fjh100456 what do you think? is it important to test all combinations, or could we get away with setting all true and all false without much of any loss here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This combination provides different test scenarios,they are not quite the same on writing data and table property passing, all of which have the potential to affect the test results. So I think it's necessary to keep.
|
Merged to master |
|
Thank you very much @srowen @fjh100456 |
…essionCodecSuite ## What changes were proposed in this pull request? Reduced the combination of codecs from 9 to 3 to improve the test runtime. ## How was this patch tested? This is a test fix. Closes apache#22641 from dilipbiswal/SPARK-25611. Authored-by: Dilip Biswal <[email protected]> Signed-off-by: Sean Owen <[email protected]>
What changes were proposed in this pull request?
Reduced the combination of codecs from 9 to 3 to improve the test runtime.
How was this patch tested?
This is a test fix.