Skip to content

[GOBBLIN-1919] Rework a few more elements of MR-related job exec for reuse in Temporal-based execution#3879

Merged
Will-Lo merged 3 commits intoapache:masterfrom
phet:cleanup-mr-gen-wus
Feb 21, 2024
Merged

[GOBBLIN-1919] Rework a few more elements of MR-related job exec for reuse in Temporal-based execution#3879
Will-Lo merged 3 commits intoapache:masterfrom
phet:cleanup-mr-gen-wus

Conversation

@phet
Copy link
Copy Markdown
Contributor

@phet phet commented Feb 15, 2024

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):

Continue reworking MR code to enable reuse; continuation of #3784

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

existing unit tests

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@phet phet changed the title Rework a few more elements of MR-related job exec for reuse in Temporal-based execution [GOBBLIN-1919] Rework a few more elements of MR-related job exec for reuse in Temporal-based execution Feb 15, 2024
@phet phet force-pushed the cleanup-mr-gen-wus branch from 6a8a22a to 89145e8 Compare February 21, 2024 18:07
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 21, 2024

Codecov Report

Attention: 22 lines in your changes are missing coverage. Please review.

Comparison is base (492fea2) 46.74% compared to head (2dc0155) 46.74%.

Files Patch % Lines
...main/java/org/apache/gobblin/runtime/JobState.java 0.00% 10 Missing ⚠️
...java/org/apache/gobblin/util/JobLauncherUtils.java 0.00% 7 Missing ⚠️
...rg/apache/gobblin/runtime/AbstractJobLauncher.java 69.23% 3 Missing and 1 partial ⚠️
...n/java/org/apache/gobblin/util/ParallelRunner.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3879      +/-   ##
============================================
- Coverage     46.74%   46.74%   -0.01%     
- Complexity    11152    11154       +2     
============================================
  Files          2216     2216              
  Lines         87503    87523      +20     
  Branches       9617     9616       -1     
============================================
+ Hits          40904    40913       +9     
- Misses        42908    42920      +12     
+ Partials       3691     3690       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +90 to +93
public static int readConfigNumParallelRunnerThreads(Properties props) {
return Integer.parseInt(props.getProperty(PARALLEL_RUNNER_THREADS_KEY, Integer.toString(DEFAULT_PARALLEL_RUNNER_THREADS)));
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have this config/function when a lot of other classes pass in a different config to control the number of threads in this class? Example: Metadata writer defines their own config for how many registration threads it wants when using this class. Seems to me that this class is just a generic parallel runner class and can be used in a lot of different areas, so it makes less sense for it to have its own dedicated config for number of threads unless it's defining a default.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have it as getDefaultParallelRunnerConcurrency ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the primary reason I moved it here is that MRJobRunner was referencing two configs both defined in this class.

we can certainly rename. I'd prefer not "default", just because the default value is only used as a fallback, in the case of no specific config.

as for multiple classes all leveraging this utility, while wanting their own setting, I recommend hierarchical naming, a la typesafe Configs -

Sting thisClassConfigPrefix = getClass().getName(); // potential convention...
Config thisClassConfigs = ConfigUtils.getConfig(myConfig, thisClassConfigPrefix, ConfigFactory.empty());
int numThreads = ParallelRunner.getNumThreadsConfig(thisClassConfigs);

at the moment I'd continue using Properties, since my objective was just to consolidate cross-class references... but if we find more widespread use, we could update to take Config instead.

@Will-Lo Will-Lo merged commit 436cf26 into apache:master Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants