instead of saving 1K issues per file, save issues with ID within a thousand per file #5

VladyslavBondarenko · 2021-05-28T23:43:23Z

OpenSZZ stores issues in .csv files with maximum 1000 issues in each. This way, a file <project_key>_0.csv has issues with ID from 1 to 999, file <project_key>_1.csv has issues with ID from 1000 to 1999, and so on.
To fetch certain portion of issues from Jira API, OpenSZZ uses the next parameters in JQL query:
– project=<project_key> ORDER BY key ASC
– tempMax=1000
– pager/start=<page_number*1000>
With <project_key> = OOZIE and <page_number> = 2 the query is interpreted as follows: from all issues of OOZIE project sorted by issue key in ascending order return 1000 issues starting from 2000th result.
On the step of linking commits to issues, OpenSZZ extracts issue IDs from commit messages. Then OpenSZZ searches the issues with IDs equal to the extracted ones not in all files with fetched issues, but only in files that are supposed to contain them. Therefore, OpenSZZ will search an issue OOZIE-2222 in OOZIE_2.csv.

The process works correctly as long as issues are not deleted in the Jira project. When some issues are deleted from a Jira project, it is possible that some other issues will not be found by OpenSZZ because they are stored in another file and not in the file where they are supposed to be. For example, if any issue with the ID
between 1000 and 2000 is deleted, then the query used to return 1000 results after 1000 results returns issues with IDs from 1000 to 2001. Thus, the issue with the ID 2000 will be stored in the file <project_key>_1.csv and will not be found in <project_key>_2.csv. Hence, even if a commit that references the issue with ID 2000 is a bug-fixing commit, it will not be considered bug-fixing because issues linked in the commit message will not be found.

…ousand per file

instead of saving 1K issues per file, save issues with ID within a th…

eda808e

…ousand per file

VladyslavBondarenko mentioned this pull request May 29, 2021

Improve OpenSZZ-Cloud-Native clowee/OpenSZZ-Cloud-Native#37

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

instead of saving 1K issues per file, save issues with ID within a thousand per file #5

instead of saving 1K issues per file, save issues with ID within a thousand per file #5

VladyslavBondarenko commented May 28, 2021

instead of saving 1K issues per file, save issues with ID within a thousand per file #5

Are you sure you want to change the base?

instead of saving 1K issues per file, save issues with ID within a thousand per file #5

Conversation

VladyslavBondarenko commented May 28, 2021