Skip to content

Conversation

@dilipbiswal
Copy link
Contributor

@dilipbiswal dilipbiswal commented Oct 5, 2018

What changes were proposed in this pull request?

Reduced the combination of codecs from 9 to 3 to improve the test runtime.

How was this patch tested?

This is a test fix.

@dilipbiswal
Copy link
Contributor Author

@mgaido91 Could you please take a look ? Thank you.

@dilipbiswal dilipbiswal changed the title [SPARK-25611][SPARK-25622][SQL][TESTS] Improve test run time of CompressionCodecSuite [SPARK-25611][SPARK-25612][SQL][TESTS] Improve test run time of CompressionCodecSuite Oct 5, 2018
@mgaido91
Copy link
Contributor

mgaido91 commented Oct 5, 2018

I am not sure about this @dilipbiswal. Taking only one element is a bit too risky as there may be a working combination and a non-working one and here you don't know which one you are picking. I am afraid this could create very instable tests in case of regression, with the serious risk of non-detecting a regression on a run.

@dilipbiswal
Copy link
Contributor Author

@mgaido91 Trying to understand the concern regarding "working combination and a non-working " comment. In my understanding, originally we were doing a cross join between two sets of codecs. so if outer loop has 2 elements and inner has 3 , then we would test 6 combinations. With the current change , in a given run, we would just randomly pick one out of that 6 combination with a hope that another run would pick a different combination out of the 6 possible combination. In case of a failure in this test, we should run the full tests i.e all 6 combination to identify the issue. Please let me know what i could be missing here ?

@SparkQA
Copy link

SparkQA commented Oct 5, 2018

Test build #96991 has finished for PR 22641 at commit 2d7ffa6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor

mgaido91 commented Oct 5, 2018

@dilipbiswal you're perfectly right. The point is: let's assume we introduce a bug which causes returning always SNAPPY. Your test will pass about 2/3 times out of 6. So it is quite likely that the test automation would fail detecting the issue.

In the case you referenced, I was taking 50 out of 600 possible values. 50 is a number quite high anyway, so it enforces well enough that we support multiple combinations. Here the cardinality is much lower, so we are not enforcing that we are generalizing well.

@dilipbiswal
Copy link
Contributor Author

dilipbiswal commented Oct 6, 2018

@mgaido91
Thanks for your input.

I took another look at the testcase. Let me outline some of my understandings first.

  • The test validates the precedence rules in determining the resultant compression to be used in the presence of SessionLevel codecs and Table level codecs.
  • It verifies the correct compression is picked by reading the metadata information from parquet/orc file metadata.
  • The accepted configuration for parquet are : none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd
  • The accepted configuration for orc are : none, uncompressed, snappy, zlib, lzo
  • The testcase in question use only a SUBSET of allowable codecs for parquet :
    uncompressed, snappy, gzip
  • The test case in question use only a SUBSET of allowable codecs for orc :
    None, Snappy, Zlib

One thing to note is that, the codecs being tested are not exhaustive and we pick a subset (perhaps the most popular ones). Other thing is that, we have a 3 way loop 1) isPartitioned 2) convertMetastore 3) useCTAS on top of the codec loop. So we will be calling the codec loop 8 times in a test for each unique combination of (isPartitioned, convertMetastore, useCTAS). And we have changed the codec loop to randomly pick one combination of table level and session level codecs.

Given this, i feel we are getting a decent coverage and also i feel we should be able to catch regression as we will catch it in some jenkin run or the other. If you still feel uncomfortable, should we take 2 codecs as opposed to 1 ? It will generate a 32 (4 * 8) times loop as opposed to 72 (9 * 8).

@srowen
Copy link
Member

srowen commented Oct 6, 2018

-1 We definitely can't just randomly test a subset of cases we need to test in order to make things faster. Worse, it makes the failure nondeterministic.

However, if there's a good argument that some combinations that are tested really are superfluous, let's make a change that avoids those extra tests.

For example: is there a good argument that testing combinations that cover all options at least once, instead of testing every combination, is probably sufficient?

That is, if I want to test combinations of options A, B, C and x, y, then is it probably sufficient here to test, say, (A,x), (B,x), (C,y) rather than all 6 possibilities?

@dilipbiswal
Copy link
Contributor Author

dilipbiswal commented Oct 6, 2018

@srowen Thank you for your comments. Actually from a cursory look, i would agree that it does not look that pretty. I also agree that it does look like we are not testing as much as we used to.

This testcase was added as part of SPARK-21786 and if we look at what this is trying to test, its basically testing the following the precedence of how compression codec is choosen :

  val codecName = parameters
      .get("compression")
      .orElse(parquetCompressionConf)
      .getOrElse(sqlConf.parquetCompressionCodec)
      .toLowerCase(Locale.ROOT)

Basically what we are trying to test is , if table and session level codecs are specified which one wins ? So we could have just tested this with 1 value of table codec (say snappy) and one value of session codec say (gzip). But we are trying to be extra cautious and testing a cross product of 3 * 3 combination. It seems to me the 3 values that we have chosen are probably the most commonly used ones. So i wanted to preserve this input set .. but decide which combination to test randomly. Also, like i mentioned above, we have a 8 way loop on top , which mean in 1 run, we would probably pick 8 out of 9 combination of codecs. And in so many runs of jenkins, we will eventually test all the combination that we wanted to test in the first place there by catching any regression that occurs.

Given the code that we are trying to test, it would be extremely rare that it would work for one codec combination but fails for another as the logic is codec value agnostic but merely a precedence check. However we are taking a hit for every run , by developers who run on their laptop and all the jenkin runs that happens automatically.

So we have a few options :

  1. Reduce the input codec list from 3 to 2 (or 1) => number of test combinations goes down to 32 from 72
  2. Do nothing.
  3. Do what the pr is doing - pick 1 at random => number of test combination is 8 in a given run

I will do whatever you guys prefer here... Please advice.

@srowen
Copy link
Member

srowen commented Oct 6, 2018

I would vote against running tests that we think have any value randomly. It's just the wrong way to solve problems, as much as it would be to simply run 90% of our test suites each time on the theory that eventually we'd catch bugs.

If there are, say, 3 codecs, and the point is to test whether one specified codec overrides another, does that really need more than 1 test? is there any reason to believe that override works/doesn't work differently for different codecs? Or are 3 tests sufficient, one to test overriding of each?

If not, I'd say do nothing. The maximum win here is about a minute of test time? not worth it.

Or: can these cases be parallelized within the test suite?

@dilipbiswal
Copy link
Contributor Author

@srowen

I would vote against running tests that we think have any value randomly. It's just the wrong way to solve problems, as much as it would be to simply run 90% of our test suites each time on the theory that eventually we'd catch bugs.

OK.

If there are, say, 3 codecs, and the point is to test whether one specified codec overrides another, does that really need more than 1 test? is there any reason to believe that override works/doesn't work differently for different codecs? Or are 3 tests sufficient, one to test overriding of each?

@gatorsmile Would you have some input here. Can we reduce the codec input set, given we are just testing precedence

@dilipbiswal
Copy link
Contributor Author

dilipbiswal commented Oct 6, 2018

@srowen @gatorsmile
Let me see if we can target a low hanging fruit first :-). Given the same input set is used for table and session codecs, we can skip when they both are same value . This way, we can do the inner loop 6 times as opposed to 9 times cutting to total testing combination from 72 to 48.

Does this sound reasonable ?

@SparkQA
Copy link

SparkQA commented Oct 7, 2018

Test build #97071 has finished for PR 22641 at commit 01f1f97.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 7, 2018

Test build #97083 has finished for PR 22641 at commit 361f5f7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of thing looks like a much better approach to reducing the runtime while only skipping tests we think are not meaningful.

tableCompressionCodecs = tableCompressCodecs) {
case (tableCodec, sessionCodec, realCodec, tableSize) =>
val expectCodec = tableCodec.get
val expectCodec = if (tableCodec.isDefined) tableCodec.get else sessionCodec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be tableCodec.getOrElse(sessionCodec)

@SparkQA
Copy link

SparkQA commented Oct 7, 2018

Test build #97088 has finished for PR 22641 at commit 3dbc5dc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is a solid approach. This is more selective about what's important to run.


def checkForTableWithCompressProp(format: String, compressCodecs: List[String]): Unit = {
def checkForTableWithCompressProp(
format: String,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit: I think these should be indented one more space.

@SparkQA
Copy link

SparkQA commented Oct 8, 2018

Test build #97117 has finished for PR 22641 at commit fdab980.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@fjh100456
Copy link
Contributor

I think random tests are not a good solution. If the use case does run too slowly, it may be much better to reduce codecs. We have done unit tests on the codec priority in ParquetCompressionCodecPrecedenceSuite.scala and OrcSourceSuite.scala, here we just want to test the correctness of the whole process, so it seems feasible to choose only 1 or 2 codec(s) for testing.

@dilipbiswal
Copy link
Contributor Author

dilipbiswal commented Oct 9, 2018

@fjh100456 Yeah. I have removed random ness and have reduced the combinations of codecs we test with.

format: String,
tableCompressCodecs: List[String],
sessionCompressCodecs: List[String]): Unit = {
Seq(true, false).foreach { isPartitioned =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see, the two tests before were taking 2:20 and 0:47. Now they take about 0:43 each. That's clearly a win in the first case, not much in the second, as expected. I'm OK with this as an improvement, myself.

I wonder if we need all 8 combinations in this triply nested loop though. Incidentally it could be written more easily as such, but I know this was existing code:

for (isPartitioned <- Seq(true, false); convertMetastore <- Seq(true, false); usingCTAS <- Seq(true, false)) {

@fjh100456 what do you think? is it important to test all combinations, or could we get away with setting all true and all false without much of any loss here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This combination provides different test scenarios,they are not quite the same on writing data and table property passing, all of which have the potential to affect the test results. So I think it's necessary to keep.

@srowen
Copy link
Member

srowen commented Oct 10, 2018

Merged to master

@asfgit asfgit closed this in 3528c08 Oct 10, 2018
@dilipbiswal
Copy link
Contributor Author

Thank you very much @srowen @fjh100456

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…essionCodecSuite

## What changes were proposed in this pull request?
Reduced the combination of codecs from 9 to 3 to improve the test runtime.

## How was this patch tested?
This is a test fix.

Closes apache#22641 from dilipbiswal/SPARK-25611.

Authored-by: Dilip Biswal <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants