-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
non-trivial directory layout #22
Comments
How about a max_files_per_directory setting? Each process gets ceil(num_files/max_files_per_dir) directories, then randomly assigns a file to one of its directories. |
then random is not random, right? say each process is responsible for 1m files, and max_file_per_dir is 1000, then by calculation, you will create 1000 dirs under and put 1000 files in each. |
How about avg_files_per_dir = n, and this will define how many sub-dirs you will create. |
That could work. That would also make analysis easier as well since it will add another normalization factor. |
As far as directory layout goes, currently LCIO just lay it out on a per-process basis, all files are created under p_rankid. This is overly naive and won't scale when we have billions of files - as it literally put hundreds of millions of file under a SINGLE directory. In short, we need a non-trivial directory layout scheme. Maybe also drawn from a distribution.
The text was updated successfully, but these errors were encountered: