Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes slow bulk import with many tablets and file #5044

Merged
merged 3 commits into from
Nov 8, 2024

Conversation

keith-turner
Copy link
Contributor

The bulk import code was reading all tablets in the bulk import range for each range being bulk imported. This resulted in O(N^2) metadata table scans which made really large bulk imports really slow.

Added a new test that bulk imports thousands of files into thousands of tablets. Running this test w/o the fixes in this PR the following time is seen for the fate step.

DEBUG: Running LoadFiles.isReady() FATE:USER:6320e73d-e661-4c66-bf25-c0c27a0a79d5 took 289521 ms and returned 0

With this fix in this PR seeing the following times for the new test, so goes from 290s to 1.2s.

DEBUG: Running LoadFiles.isReady() FATE:USER:18e52fc2-5876-4b01-ba7b-3b3c099a82be took 1225 ms and returned 0

This bug does not seem to exists in 2.1 or 3.1. Did not run the test though, may be worthwhile to backport the test.

The bulk import code was reading all tablets in the bulk import range
for each range being bulk imported. This resulted in O(N^2) metadata
table scans which made really large bulk imports really slow.

Added a new test that bulk imports thousands of files into thousands of
tablets.  Running this test w/o the fixes in this PR the following time
is seen for the fate step.

```
DEBUG: Running LoadFiles.isReady() FATE:USER:6320e73d-e661-4c66-bf25-c0c27a0a79d5 took 289521 ms and returned 0
```

With this fix in this PR seeing the following times for the new test,
so goes from 290s to 1.2s.

```
DEBUG: Running LoadFiles.isReady() FATE:USER:18e52fc2-5876-4b01-ba7b-3b3c099a82be took 1225 ms and returned 0
```

This bug does not seem to exists in 2.1 or 3.1.  Did not run the test
though, may be worthwhile to backport the test.
@keith-turner keith-turner added this to the 4.0.0 milestone Nov 8, 2024
@keith-turner keith-turner merged commit cfae5e9 into apache:main Nov 8, 2024
8 checks passed
@keith-turner keith-turner deleted the slow_bulk_import branch November 8, 2024 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants