-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate partition scope DLO strategy and persist to DLO table #284
Generate partition scope DLO strategy and persist to DLO table #284
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the description with testing done in local docker.
...ark/src/main/java/com/linkedin/openhouse/jobs/spark/DataLayoutStrategyGeneratorSparkApp.java
Outdated
Show resolved
Hide resolved
...t/src/main/java/com/linkedin/openhouse/datalayout/generator/DataLayoutStrategyGenerator.java
Show resolved
Hide resolved
libs/datalayout/src/main/java/com/linkedin/openhouse/datalayout/datasource/TableFileStats.java
Outdated
Show resolved
Hide resolved
libs/datalayout/src/main/java/com/linkedin/openhouse/datalayout/datasource/TableFileStats.java
Show resolved
Hide resolved
...n/java/com/linkedin/openhouse/datalayout/generator/OpenHouseDataLayoutStrategyGenerator.java
Outdated
Show resolved
Hide resolved
...n/java/com/linkedin/openhouse/datalayout/generator/OpenHouseDataLayoutStrategyGenerator.java
Outdated
Show resolved
Hide resolved
...va/com/linkedin/openhouse/datalayout/generator/OpenHouseDataLayoutStrategyGeneratorTest.java
Show resolved
Hide resolved
619844f
to
9012af8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jiang95-dev, overall looks great, having a few minor comments
...src/test/java/com/linkedin/openhouse/datalayout/persistence/StrategiesDaoTablePropsTest.java
Outdated
Show resolved
Hide resolved
...ark/src/main/java/com/linkedin/openhouse/jobs/spark/DataLayoutStrategyGeneratorSparkApp.java
Outdated
Show resolved
Hide resolved
...n/java/com/linkedin/openhouse/datalayout/generator/OpenHouseDataLayoutStrategyGenerator.java
Outdated
Show resolved
Hide resolved
...atalayout/src/test/java/com/linkedin/openhouse/datalayout/datasource/TableFileStatsTest.java
Outdated
Show resolved
Hide resolved
4ef52f2
to
2ea8a2e
Compare
libs/datalayout/src/main/java/com/linkedin/openhouse/datalayout/datasource/TableFileStats.java
Show resolved
Hide resolved
libs/datalayout/src/main/java/com/linkedin/openhouse/datalayout/datasource/TableFileStats.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @jiang95-dev , looking forward to running analysis on collected data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, minor suggestion to reduce code duplication
...talayout/src/main/java/com/linkedin/openhouse/datalayout/datasource/TablePartitionStats.java
Show resolved
Hide resolved
libs/datalayout/src/main/java/com/linkedin/openhouse/datalayout/datasource/TableFileStats.java
Show resolved
Hide resolved
Will do the refactoring in the future pr. |
Summary
Generate partition scope DLO strategy and persist to the partition-level DLO table. The new DLO table contains 2 new columns: partitionId and partitionColumns. The latter will be used only for analysis purpose and won't be used as filters in the execution app. At this stage, we generate both table-level and partition-level strategies for each table.
Changes
Testing Done
Tested on the test cluster. The dlo_partition_strategies table was created and has 1 row for each partition. The file reduction count for each partition is 1, and the partition_id and partition_columns had been correctly set.
Additional Information
For all the boxes checked, include additional details of the changes made in this pull request.