-
Notifications
You must be signed in to change notification settings - Fork 748
GOBBLIN-1933]Change the logic in completeness verifier to support multi reference tier #3806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
8007fb7
eea5e0e
1119f9c
e9f2620
0821b9f
63f35c4
e2a3491
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -156,12 +156,13 @@ private double calculateCompleteness(String datasetName, long beginInMillis, lon | |
|
|
||
| /** | ||
| * Compare source tier against reference tiers. For each reference tier, calculates percentage by srcCount/refCount. | ||
| * | ||
| * We will return the lowest value, which, in other words, we will wait until src tier catches up to all reference | ||
| * tiers (upto 99.9%) to mark that hour as completed. | ||
| * @param datasetName A dataset short name like 'PageViewEvent' | ||
| * @param beginInMillis Unix timestamp in milliseconds | ||
| * @param endInMillis Unix timestamp in milliseconds | ||
| * | ||
| * @return The highest percentage value | ||
| * @return The lowest percentage value | ||
| */ | ||
| private double calculateClassicCompleteness(String datasetName, long beginInMillis, long endInMillis, | ||
| Map<String, Long> countsByTier) throws IOException { | ||
|
|
@@ -171,16 +172,19 @@ private double calculateClassicCompleteness(String datasetName, long beginInMill | |
| for (String refTier: this.refTiers) { | ||
| long refCount = countsByTier.get(refTier); | ||
| long srcCount = countsByTier.get(this.srcTier); | ||
| double tmpPercent; | ||
|
|
||
| /* | ||
| If we have a case where an audit map is returned, however, one of the source tiers on another fabric is 0, | ||
| and the reference tiers from Kafka is reported to be 0, we can say that this hour is complete. | ||
| This needs to be added as a non-zero double value divided by 0 is infinity, but 0 divided by 0 is NaN. | ||
| */ | ||
| if (srcCount == 0 && refCount == 0) { | ||
| return 1.0; | ||
| tmpPercent = 1; | ||
| } else { | ||
| tmpPercent = (double) srcCount / (double) refCount; | ||
| } | ||
| percent = Double.max(percent, (double) srcCount / (double) refCount); | ||
| percent = percent < 0 ? tmpPercent : Double.min(percent, tmpPercent); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the srcTiers are local cluster 1, local cluster 2, etc, rather than using the lowest percentage completion, would instead it be srcTier / (summation of refTier counts)? Since the total records across all the refTiers (assuming local 1, local 2, etc.) should be equal to the number of records on srcTier eventually? Depending on what the eventual conclusion of this is, let's also update the javadocs of this to match the behavior
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| } | ||
|
|
||
| if (percent < 0) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to get more clarity here
Current Scenario:
source tier = gobblin, ref tier = agg cluster
What are the refTiers in the new scenario is it (agg cluster, local cluster1) or (agg cluster, local cluster1,local cluster2,local cluster3 ) and what is the expected behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessarily for local tier, but in general to support multiple reference tiers. As for the local stuff as it's internal specific detail, reply with you offline
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think better way to put it is : we wait until src tier catches up to all reference tiers (upto 99.9%)