Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit Cache Size on Disk #18045

Open
yyilong335 opened this issue Nov 20, 2024 · 6 comments
Open

Limit Cache Size on Disk #18045

yyilong335 opened this issue Nov 20, 2024 · 6 comments
Labels
question Further information is requested

Comments

@yyilong335
Copy link

Dear Developers,

I am using CodeQL to analyze my database. In the command, I use --max-disk-cache=200000 to specify the maximum disk space the cache would take in this query. However, when it finishes, the cache is taking to 1.3TB.

Would --max-disk-cache limit the disk to use? Or is there any other command to resolve this issue?

Thank you so much.

@yyilong335 yyilong335 added the question Further information is requested label Nov 20, 2024
@aibaars
Copy link
Contributor

aibaars commented Nov 20, 2024

Thanks for reporting. I would have expected the disk cache to be limited to 200G . Could you share which folders of the codeql database and cache are larger than expected? If you are using Linux or macOS then the du command can be used to print some reports. For example (replace my-database with the folder of your codeql database, and if needed replace cpp with the language you are analyzing) :

du -sh my-database/db-cpp/default/*
du -sh my-database/db-cpp/default/cache/*

@yyilong335
Copy link
Author

Thank you for the attention to this issue! The page directory is abnormal.

I get:
2.0T default/cache/page

Thanks.

@aibaars
Copy link
Contributor

aibaars commented Nov 20, 2024

Could you get a more detailed du report for the page folder? Like du -sh default/cache/page/* or without the -s : du -h default/cache/page . Just to figure out if all page file are large or whether there are only a couple that take up all the space.

@yyilong335
Copy link
Author

Sure.

ls -l | wc -l shows there are 579 items.

du -sh * | sort -hr | head -n 10 shows the top 10 big items:

8.3G    30                                                                                                                
8.2G    fc                                                                                                                
8.2G    f2                                                                                                                
8.2G    ee                                                                                                                
8.2G    e8                                                                                                                
8.2G    e2                                                                                                                
8.2G    e1                                                                                                                
8.2G    df                                                                                                                
8.2G    de                                                                                                                
8.2G    d7

I checked 30 and fc. They are directories containing more than 500 items. The organization is just like /pages. Could it be a recursively creating pages issue?

@yyilong335
Copy link
Author

I did a quick estimate myself. It seems to me that half of the items under pages/ are directories which take a lot of space, another half is the .pack which are small. Say one such directory is 8GB, and there are more than 200 of that, so it is nearly 2TB. And the pages directory is 2TB. Hence I believe it's too many such big directories to take space.

@aibaars
Copy link
Contributor

aibaars commented Nov 21, 2024

I checked with some of the CodeQL developers and they said:

The --max-disk-cache value is not really a hard limit, more a firm wish. The evaluator will try to stay under the indicated size by removing pages that were kept "because they may be useful later". However, if using more than that is the only way it can actually complete the evaluation, that's what it will do.

The word "cache" here is actually a bit of a misnomer -- in production analyses, this space is mostly used for intermediate results that we had to spill out from RAM because there were too many of them to fit there.

If the results take up that much space on disk, it's probably a symptom that CodeQL is doing far too much computation, so I'd presume the query also takes far too long.

@yyilong335 If you are willing to share the database and the query, we can try to determine whether there's a performance problem with one of our own supported queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants