instead of saving 1K issues per file, save issues with ID within a thousand per file #5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
OpenSZZ stores issues in .csv files with maximum 1000 issues in each. This way, a file <project_key>_0.csv has issues with ID from 1 to 999, file <project_key>_1.csv has issues with ID from 1000 to 1999, and so on.
To fetch certain portion of issues from Jira API, OpenSZZ uses the next parameters in JQL query:
– project=<project_key> ORDER BY key ASC
– tempMax=1000
– pager/start=<page_number*1000>
With <project_key> = OOZIE and <page_number> = 2 the query is interpreted as follows: from all issues of OOZIE project sorted by issue key in ascending order return 1000 issues starting from 2000th result.
On the step of linking commits to issues, OpenSZZ extracts issue IDs from commit messages. Then OpenSZZ searches the issues with IDs equal to the extracted ones not in all files with fetched issues, but only in files that are supposed to contain them. Therefore, OpenSZZ will search an issue OOZIE-2222 in OOZIE_2.csv.
The process works correctly as long as issues are not deleted in the Jira project. When some issues are deleted from a Jira project, it is possible that some other issues will not be found by OpenSZZ because they are stored in another file and not in the file where they are supposed to be. For example, if any issue with the ID
between 1000 and 2000 is deleted, then the query used to return 1000 results after 1000 results returns issues with IDs from 1000 to 2001. Thus, the issue with the ID 2000 will be stored in the file <project_key>_1.csv and will not be found in <project_key>_2.csv. Hence, even if a commit that references the issue with ID 2000 is a bug-fixing commit, it will not be considered bug-fixing because issues linked in the commit message will not be found.