-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-10568. When the ldb command is executed, it is output by line #6420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ci : Can you help review this pr, @adoroszlai @xichen01 . |
xichen01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, few comments to handle
| return displayTable(iterator, dbColumnFamilyDef, out, schemaV3); | ||
| while (iterator.get().isValid()) { | ||
| try (PrintWriter out = new PrintWriter(new BufferedWriter( | ||
| new PrintWriter(fileName + fileSuffix, UTF_8.name())))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If preFileRecords is not specified, we'd better make the filename the same as the previous filename (without fileSuffix)
| batch = new ArrayList<>(batchSize); | ||
| sequenceId++; | ||
| } | ||
| if ((preFileRecords > -1) && (count >= preFileRecords)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the ldb will generate unlimited empty file If the preFileRecords is zero.
| @CommandLine.Option(names = {"--pre-file-records"}, | ||
| description = "The number of print records per file.", | ||
| defaultValue = "-1") | ||
| private long preFileRecords; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: suggest --max-records-per-file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comment and review, @xichen01 .
I will update soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also rename preFileRecords to recordsPerFile.
(pre means "before")
|
Can you help review this PR again, @xichen01 ? |
|
Can you help review this pr, @kerneltime @errose28 . |
| } | ||
| fileSuffix++; | ||
| } | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can simplify this if... else
Like:
//...
String fileNameXXX = preFileRecords > 0 ? fileName + fileSuffix++ : fileName;
//...
new PrintWriter(fileNameXXX, UTF_8.name())
errose28
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvement @jianghuazhu. I think the idea is solid since just using split on a stdout stream may produce individual files that are not valid json. Let's add some tests to TestLDBCli to make sure we have all the corner cases around various flag combinations working.
| private int threadCount; | ||
|
|
||
| @CommandLine.Option(names = {"--max-records-per-file"}, | ||
| description = "The number of print records per file.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| description = "The number of print records per file.", | |
| description = "The number of records to print per file.", |
| if ((preFileRecords > 0) && (count >= preFileRecords)) { | ||
| break; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the expected behavior when this new --max-records-per-file flag is used without --out? Right now it looks like the choice that stdout is considered "one file" and so this flag overrides the --length option:
# The DB here has many more than 3 entries
$ ./ozone debug ldb --db=om.db scan --column_family=fileTable -l3 --max-records-per-file=2 | jq '.[].keyName' | wc -l
2
$ ./ozone debug ldb --db=om.db scan --column_family=fileTable -l2 --max-records-per-file=3 | jq '.[].keyName' | wc -l
2
Maybe we should disallow --max-records-per-file without --out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-l is also broken with this new option and I got a bit of a surprise trying to test this 😄 I would have expected 5 files, here not 57 thousand.
$ ./ozone debug ldb --db=om.db scan --column_family=fileTable -l10 --max-records-per-file=2 --out=foo
^C
$ ls -l | grep foo | wc -l
57343
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comment and review.
I will update soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-lis also broken with this new option and I got a bit of a surprise trying to test this 😄 I would have expected 5 files, here not 57 thousand.$ ./ozone debug ldb --db=om.db scan --column_family=fileTable -l10 --max-records-per-file=2 --out=foo ^C $ ls -l | grep foo | wc -l 57343
When --out is not set, all records are output to stdout.
When --max-records-per-file and -l exist at the same time, --max-records-per-file shall prevail.
|
/pending |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking this issue as un-mergeable as requested.
Please use /ready comment when it's resolved.
Please note that the PR will be closed after 21 days of inactivity from now. (But can be re-opened anytime later...)
/pending
|
Sorry, I had some other work some time ago. |
|
https://github.com/apache/ozone/actions/runs/8849501009/job/24349282946?pr=6420 |
ozone/hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/DBScanner.java Lines 228 to 233 in c7012f3
ozone/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/debug/TestLDBCli.java Lines 285 to 286 in c7012f3
|
|
ci: https://github.com/jianghuazhu/ozone/actions/runs/8876637070 |
|
/ready |
Blocking review request is removed.
|
|
||
| private boolean withinLimit(long i) { | ||
| return limit == -1L || i < limit; | ||
| return recordsPerFile > 0 || limit == -1L || i < limit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the recordsPerFile > 0 is true, the subsequent judgments will be short-circuited, including the i < limit then the limit will be invalidated. This is not a expected function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @xichen01 for the comment and review.
When recordsPerFile>0, it means that --max-records-per-file has taken effect, and --limit should be ignored at this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The --limit is used to limit the total count of records, the --max-records-per-file is used to limit the max records count of specific file.
Such as:
ozone debug ldb ... --limit 10 --max-records-per-file 1 --out result.txt
This command should generate 10 files, like result.txt0, result.txt1, ..., and each of them has 1 record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update later.
|
/pending "I'll update later" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking this issue as un-mergeable as requested.
Please use /ready comment when it's resolved.
Please note that the PR will be closed after 21 days of inactivity from now. (But can be re-opened anytime later...)
"I'll update later"
|
Thank you very much for the patch. I am closing this PR temporarily as there was no activity recently and it is waiting for response from its author. It doesn't mean that this PR is not important or ignored: feel free to reopen the PR at any time. It only means that attention of committers is not required. We prefer to keep the review queue clean. This ensures PRs in need of review are more visible, which results in faster feedback for all PRs. If you need ANY help to finish this PR, please contact the community on the mailing list or the slack channel." |
|
Continued in #7467. |


What changes were proposed in this pull request?
When executing the ldb command, if the data is very large, a very large file will be generated, which is not friendly. This pr will add a new function that can control the maximum number of records allowed to be saved in each file.
Details:
HDDS-10568
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10568
How was this patch tested?