Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingesting a file causes high read pressure #3104

Closed
huachaohuang opened this issue Nov 1, 2017 · 16 comments
Closed

Ingesting a file causes high read pressure #3104

huachaohuang opened this issue Nov 1, 2017 · 16 comments
Assignees

Comments

@huachaohuang
Copy link
Contributor

In the current implementation, ingesting a file needs to check if the file is overlapped with each level. For each level, all files at that level will be opened and prefetched (512K), to collect all the delete ranges at that level. Then each delete range of that level will be checked to see if it is overlapped with the ingesting file.
For an instance with tens of thousands files, ingesting a file without overlap will open and prefetch all files, which causes tens of gigabytes read flow.
Do we really need to check all delete ranges of all files at a level to see whether the ingesting file is overlapped?
Is it possible to optimize it so that we just need to check some boundaries?
@ajkr

@ajkr
Copy link
Contributor

ajkr commented Nov 1, 2017

file ingestion has an optimization that prevented me from just checking file endpoints. That optimization allows files to be ingested into lower levels even if their endpoints overlap with files at higher levels, as long as the higher-level file does not have any keys in the ingested file's key-range. So even before range deletion it would open the higher-level file to check whether this condition holds. I guess the problem is worse now.

Do you know where the tens of GBs is coming from? Is it only from opening the files and reading range tombstone block? We can likely optimize it.

@ajkr ajkr self-assigned this Nov 1, 2017
@huachaohuang
Copy link
Contributor Author

I captured a flame graph when the disk was reading at full speed here:
https://transfer.sh/ul9GV/kernel.svg

@huachaohuang
Copy link
Contributor Author

Hi, any update or any idea how to optimize it? Anything we can help?
@ajkr

@siddontang
Copy link
Contributor

siddontang commented Nov 7, 2017

@ajkr

Can we cache all the delete range blocks to avoid re-read from SST? Maybe like pin-l0-filter-and-index-blocks, we can add a pin-range-del-blocks.

@siddontang
Copy link
Contributor

Maybe we can also use a map to cache the files which contain the range del block to avoid checking all files.

@huachaohuang
Copy link
Contributor Author

huachaohuang commented Nov 7, 2017

That optimization allows files to be ingested into lower levels even if their endpoints overlap with files at higher levels, as long as the higher-level file does not have any keys in the ingested file's key-range.

Can we just check the file endpoints if we don't mind to disable this optimization?

Or, since we haven't used delete range in production yet, can we add an option to disable delete range, so that we don't need to check the delete ranges in file ingestion?

huachaohuang added a commit to tikv/rocksdb that referenced this issue Nov 8, 2017
@zhangjinpeng87
Copy link
Contributor

We only need read delete range marks from these files which overlap with the sst file is about to ingest.
@ajkr Is there anything can't satisfy if use this strategy?

@zhangjinpeng87
Copy link
Contributor

We found some conflict when compaction type is not universal, since we need check overlap here, the previous checks is redundant.

@ajkr
Copy link
Contributor

ajkr commented Nov 14, 2017

The previous check is needed in order to ingest files into the lowest possible level. For example,

  • L1 has file1 containing keys A, B, C, Z
  • ingested file has keys D, E, F
  • L2 has file2 containing keys A, B

What we do is:

  • at L1, IngestedFileOverlapWithLevel sets overlap_with_level=false
  • at L1, IngestedFileFitInLevel returns false so we don't update target_level
  • at L2, IngestedFileOverlapWithLevel sets overlap_with_level=false
  • at L2, IngestedFileFitInLevel returns true so we update target_level
  • then the file can be ingested in L2

Whether or not we should keep this behavior, I'm not sure. But I do think that the second commit in your PR would prevent the file from ingesting into L2.

@ajkr
Copy link
Contributor

ajkr commented Nov 14, 2017

It feels kind of surprising that we check more than one file at each level. cc @IslamAbdelRahman - can we just check the one that overlaps the ingested file (if multiple overlap that implies there must be keys overlapping the ingested file's range so no need to continue)?

@mikhail-antonov
Copy link
Contributor

I'll look at the code around it, but first questions that comes to my mind

  • if the IngestExternalFile() is called with a long list of files, i expect from reading the above that we pay this price for every single file we ingest, and it happens sequentially?
  • for people who don't use range deletes, does this IO price still need to be paid, there's no flag to disable range deletes checking during the bulkload like as an expert user option (don't check that range deletes don't conflict as I know for sure there are no range deletions?)

@ajkr
Copy link
Contributor

ajkr commented Nov 14, 2017

For the second point, I'd prefer to optimize it so range deletions don't add overhead when they're not used. I don't think there's anything fundamentally preventing it -- we already opened the file to check regular keys so we can easily see whether a range deletion meta-block exists without any extra I/O. We just need to find time to implement it.

@ajkr
Copy link
Contributor

ajkr commented Nov 17, 2017

Hi, want to try out #3179? In my testing it only reads range deletions from the relevant files.

@huachaohuang
Copy link
Contributor Author

Awesome! We will test it out ASAP and give some feedback later.

@huachaohuang
Copy link
Contributor Author

Hi, we have been testing #3179 for a while, so far so good, it's very stable now, good job!

@ajkr
Copy link
Contributor

ajkr commented Dec 4, 2017

Thanks @huachaohuang! Please let me know if you notice any more issues.

@ajkr ajkr closed this as completed Dec 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants