-
Notifications
You must be signed in to change notification settings - Fork 29k
[Project Infra] SPARK-1684: Merge script should standardize SPARK-XXX prefix #5149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
8f4a7d1
042099d
48520ba
aa20a6e
25229c6
43b5aed
df73f6a
4f1ed46
8c195bb
7d5fa20
9b6b0a7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -30,6 +30,7 @@ | |
| import subprocess | ||
| import sys | ||
| import urllib2 | ||
| from collections import deque | ||
|
|
||
| try: | ||
| import jira.client | ||
|
|
@@ -285,6 +286,56 @@ def resolve_jira_issues(title, merge_branches, comment): | |
| for jira_id in jira_ids: | ||
| resolve_jira_issue(merge_branches, comment, jira_id) | ||
|
|
||
| def standardize_jira_ref(text): | ||
| # Standardize the [MODULE] SPARK-XXXXX prefix | ||
| # Converts "[SPARK-XXX][mllib] Issue", "[MLLib] SPARK-XXX. Issue" or "SPARK XXX [MLLIB]: Issue" to "[MLLIB] SPARK-XXX: Issue" | ||
|
|
||
| #If the string is compliant, no need to process any further | ||
| if (re.search(r'\[[A-Z0-9_]+\] SPARK-[0-9]{3,5}: \S+', text)): | ||
| return text | ||
|
|
||
| # Extract JIRA ref(s): | ||
| jira_refs = deque() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any reason this and components can't just be normal Python lists? i.e.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you do convert them to lists, you can remove the |
||
| pattern = re.compile(r'(SPARK[-\s]*[0-9]{3,5})', re.IGNORECASE) | ||
| while (pattern.search(text) is not None): | ||
| ref = pattern.search(text).groups()[0] | ||
| # Replace any whitespace with a dash & convert to uppercase | ||
| jira_refs.append(re.sub(r'\s', '-', ref.upper())) | ||
| text = text.replace(ref, '') | ||
|
|
||
| # Extract spark component(s): | ||
| components = deque() | ||
| # Look for alphanumeric chars, spaces, and/or commas | ||
| pattern = re.compile(r'(\[[\w\s,]+\])', re.IGNORECASE) | ||
| while (pattern.search(text) is not None): | ||
| component = pattern.search(text).groups()[0] | ||
| # Convert to uppercase | ||
| components.append(component.upper()) | ||
| text = text.replace(component, '') | ||
|
|
||
| # Cleanup remaining symbols: | ||
| pattern = re.compile(r'^\W+(.*)', re.IGNORECASE) | ||
| if (pattern.search(text) is not None): | ||
| text = pattern.search(text).groups()[0] | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe I don't follow this totally, but could the rest of this function be written a bit more simply? |
||
| # Assemble full text (module(s), JIRA ref(s), remaining text) | ||
| if (len(components) < 1): | ||
| components = "" | ||
| component_text = ' '.join(components).strip() | ||
| if (len(jira_refs) < 1): | ||
| jira_ref_text = "" | ||
| jira_ref_text = ' '.join(jira_refs).strip() | ||
|
|
||
| if (len(jira_ref_text) < 1 and len(component_text) < 1): | ||
| clean_text = text.strip() | ||
| elif (len(jira_ref_text) < 1): | ||
| clean_text = component_text + ' ' + text.strip() | ||
| elif (len(component_text) < 1): | ||
| clean_text = jira_ref_text + ': ' + text.strip() | ||
| else: | ||
| clean_text = component_text + ' ' + jira_ref_text + ': ' + text.strip() | ||
|
|
||
| return clean_text | ||
|
|
||
| branches = get_json("%s/branches" % GITHUB_API_BASE) | ||
| branch_names = filter(lambda x: x.startswith("branch-"), [x['name'] for x in branches]) | ||
|
|
@@ -296,7 +347,7 @@ def resolve_jira_issues(title, merge_branches, comment): | |
| pr_events = get_json("%s/issues/%s/events" % (GITHUB_API_BASE, pr_num)) | ||
|
|
||
| url = pr["url"] | ||
| title = pr["title"] | ||
| title = standardize_jira_ref(pr["title"]) | ||
| body = pr["body"] | ||
| target_ref = pr["base"]["ref"] | ||
| user_login = pr["user"]["login"] | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be pretty cool to add a doctest here. I've generally found doctests to be pretty helpful when developing parsing / string manipulation stuff like this, such as the pull request title parser that I wrote for spark-prs.appspot.com: https://github.com/databricks/spark-pr-dashboard/blob/0ee9102b7bb30fa61e2f47c3d282c5a018a7d198/sparkprs/utils.py#L55
Unfortunately, this script isn't well-structured to support doctests because the main code isn't in a method. If you want to test this, we can move the lines below this function (starting on 291/341) into a new main() function, then call that from a `if name == 'main' block. It will blow up the size of the diff a bit, but that's not a huge deal since we don't backport changes to this file anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea, Josh. Check out the latest diff and see if it's what you had in mind.
I like what you have in spark-pr-dashboard. Would it make more sense to reference your code from the merge script instead? It's a bit different & would require a bit of finagling, but I hate to have similar code duplicated in different places.