Skip to content

Conversation

@nsivabalan
Copy link
Contributor

@nsivabalan nsivabalan commented Jan 3, 2021

What is the purpose of the pull request

Enhancements and fixes to test suite framework.

Brief change log

  • Added ClusteringNode to perform inline clustering
  • Added ValidateAsyncOperationsNode to perform hoodie dataset sanity checks like file versions, clean ups and archivals.
  • Added few yaml files to assist in test runs. (clustering, long running w/ validate async operation)
  • Added some extra configs:
    validate_hive: optional config to include hive validation as part of ValidateDatasetNode
    execute_itr_count: Some nodes need to be executed only in one off iterations with a long running job. For eg: do clustering at 25th iteration among 50 iterations, etc.
    validate_archival: optional config to include to add archival validation in ValidateAsyncOperationsNode
    validate_clean: optional config to include to add clean up validation in ValidateAsyncOperationsNode

Verify this pull request

Manually ran test suite jobs to validate the changes.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@codecov-io
Copy link

codecov-io commented Jan 3, 2021

Codecov Report

Merging #2400 (729c34c) into master (a38612b) will increase coverage by 18.75%.
The diff coverage is 0.00%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master    #2400       +/-   ##
=============================================
+ Coverage     50.69%   69.44%   +18.75%     
+ Complexity     3059      357     -2702     
=============================================
  Files           419       53      -366     
  Lines         18810     1931    -16879     
  Branches       1924      230     -1694     
=============================================
- Hits           9535     1341     -8194     
+ Misses         8498      458     -8040     
+ Partials        777      132      -645     
Flag Coverage Δ Complexity Δ
hudicli ? ?
hudiclient ? ?
hudicommon ? ?
hudiflink ? ?
hudihadoopmr ? ?
hudisparkdatasource ? ?
hudisync ? ?
huditimelineservice ? ?
hudiutilities 69.44% <0.00%> (-0.04%) 0.00 <0.00> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...apache/hudi/utilities/deltastreamer/DeltaSync.java 70.35% <0.00%> (-0.51%) 51.00 <0.00> (ø)
.../org/apache/hudi/io/storage/HoodieHFileReader.java
.../apache/hudi/hive/MultiPartKeysValueExtractor.java
...src/main/java/org/apache/hudi/QuickstartUtils.java
...org/apache/hudi/cli/commands/RollbacksCommand.java
...he/hudi/hadoop/SafeParquetRecordReaderWrapper.java
...e/hudi/common/table/log/block/HoodieDataBlock.java
.../apache/hudi/common/fs/TimedFSDataInputStream.java
...util/jvm/OpenJ9MemoryLayoutSpecification32bit.java
...org/apache/hudi/common/util/collection/Triple.java
... and 358 more

@n3nash
Copy link
Contributor

n3nash commented Jan 5, 2021

@nsivabalan Please work with @satishkotha before landing this since he is also looking to add similar tests around clustering

@nsivabalan nsivabalan force-pushed the testSuiteEnv branch 3 times, most recently from b405f25 to ab40bd6 Compare January 5, 2021 18:19
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no changes except adding this if condition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are part of reverting e33a8f7

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are part of reverting e33a8f7

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all changes in this file are part of reverting e33a8f7

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all changes in this file are part of reverting e33a8f7

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are part of reverting e33a8f7

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@n3nash @satishkotha : after clustering, when we call readFromSource in deltaStreamer, execution was going into lines 314 to 316. hence have added this condition to return empty checkpoint. Can you confirm this looks ok.

Copy link
Contributor

@n3nash n3nash Jan 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsivabalan This should be fine since the commit timeline to read the CHECKPOINT from does not include the clustering instants. But I think this is already fixed with this PR -> #2400

Copy link
Contributor

@yihua yihua Dec 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fine since the commit timeline to read the CHECKPOINT from does not include the clustering instants.

@n3nash @nsivabalan This is not correct. The commit timeline does have the clustering instants (replacecommit) based on the changes in #2048. So this breaks the logic of getting the last checkpoint, even if we have the walk-back logic in #4034, which is skipped.

@nsivabalan nsivabalan changed the title [WIP] Some fixes to test suite framework. Adding clustering node Some fixes to test suite framework. Adding clustering node Jan 5, 2021
@nsivabalan nsivabalan changed the title Some fixes to test suite framework. Adding clustering node Some fixes and enhancements to test suite framework Jan 5, 2021
@nsivabalan nsivabalan marked this pull request as ready for review January 5, 2021 18:20
@nsivabalan
Copy link
Contributor Author

@n3nash : Patch is ready for review.
@satishkotha : I have added clustering node. Do check it out.

@n3nash n3nash self-assigned this Jan 10, 2021
Copy link
Contributor

@n3nash n3nash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsivabalan high level dag looks good, will review code changes more deeply later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Record size of 100 bytes ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyways, w/o complex schema, we are not generating records of size 7000 bytes. So, thought will keep it to some sane value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsivabalan Is this supposed to mean run this twice ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope. execute this node only during iteration count 2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just rename this to REPEAT_COUNT and use it that way (if that's the intention) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@n3nash : this refers to the iteration count/number where this node needs to be executed. Basically "execute at iteration count" config. I couldn't come up w/ a better name. If you have good suggestions, let me know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be "itr_count_to_execute" to be explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use "REPEAT_COUNT" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as synced up offline, this refers to current iteration count. Have renamed this arg to "curItrCount".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because the checkpoint is not passed ? There is some code that passes the checkpoint even if write client is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created a follow up task for satish to look into async clustering. I went through clustering test examples and added support for inline. haven't much time to get async clustering working.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Satish has a PR ready as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to use the config to drive the itrCount and not have this signature change please..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this refers to current iteration count. Don't think we can get away with this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/maxCommitsRetailed/maxCommitsRetained.

Also, not sure what 11 means..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have fixed this. I instantiate writeClientConfig in HoodieTestSuiteWriter for both code paths(deltastreamer, writeClient) and use it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's chat about all of this logic 1-1. This code is very difficult to read..may be there is a better way to achieve what you are trying

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

Copy link
Contributor

@n3nash n3nash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsivabalan left some comments. Also since you are reverting e33a8f7, does this mean we cannot generate correct timestamps again ?

@n3nash
Copy link
Contributor

n3nash commented Feb 4, 2021

@nsivabalan Is this ready ?

@vinothchandar
Copy link
Member

@nsivabalan can we create a JIRA for this. prioritize this higher and land it. We want to get the integ-suite into good shape asap.

@nsivabalan
Copy link
Contributor Author

@n3nash : Addressed all your comments.

@nsivabalan
Copy link
Contributor Author

@vinothchandar : yes, I synced up with nishith yesterday on some of the pending comments. we should land it sooner.

@nsivabalan nsivabalan changed the title Some fixes and enhancements to test suite framework [HUDI-1594] Some fixes and enhancements to test suite framework Feb 7, 2021
Copy link
Contributor

@n3nash n3nash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can you open a ticket to refactor the itr_count so we don't lost context ?

@n3nash n3nash merged commit d5f2028 into apache:master Feb 12, 2021
@vinothchandar
Copy link
Member

@n3nash @nsivabalan I would like for some of this to run on every commit. at least 1 test for each COW and MOR. Can one of you be able to add that to the CI?

@nsivabalan
Copy link
Contributor Author

@n3nash : https://issues.apache.org/jira/browse/HUDI-1616
@vinothchandar : sure. I will sync up with nishith on this and will take it up.

prashantwason pushed a commit to prashantwason/incubator-hudi that referenced this pull request Aug 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants