-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate delete file statistics for DLO strategy #287
base: main
Are you sure you want to change the base?
Conversation
1898935
to
5867db2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, Christian. Looks good! Left a few questions
strategy.getPosDeleteFileCount(), | ||
strategy.getEqDeleteFileCount(), | ||
strategy.getPosDeleteFileBytes(), | ||
strategy.getEqDeleteFileBytes())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be a nested struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that these values are related, but a con is that the a struct would make ser/de have more steps.
is there another pro?
...src/test/java/com/linkedin/openhouse/jobs/spark/DataLayoutStrategyGeneratorSparkAppTest.java
Outdated
Show resolved
Hide resolved
libs/datalayout/src/main/java/com/linkedin/openhouse/datalayout/datasource/FileStat.java
Outdated
Show resolved
Hide resolved
+ "PARTITIONED BY (days(timestamp))", | ||
outputFqtn)); | ||
} | ||
// TODO: is StrategiesDaoTableProps supposed to be here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiang95-dev did we forget to call StrategiesDaoTableProps.save in the partitioned table condition here? or was it intentional
@@ -0,0 +1,131 @@ | |||
package com.linkedin.openhouse.datalayout.persistence; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do DAO refactor into a separate PR and not mix with the delete file stats PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact let's do the delete file stats PR first, and then do DAO refactor, will need to think more about it and will take longer for that to land.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will take this out of this PR,
will need to think more about it and will take longer for that to land.
I'd like to follow up with the refactor shortly after this PR deploys and works. whats the concern? the refactor is intended to not change behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No major concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduce scope of this PR to not include DAO refactor.
Summary
problem: delete files are accumulating across tables due to trino default write behavior for UPDATE/MERGE/DELET statements
solution:
outcome: databases accumulating delete files will be discovered and compacted appropriately to prevent degraded query impact
✨ bonus content ✨ :
I noticed there is a gap in DLO testing because manual validation must be done to verify the final form and contents of the DLO table, e.g. #284
so:
this will help automate the testing for future changes and prevent bad changes from occuring due to validation being manually done
Changes
Testing Done
Additional Information