Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added --restriction flag #154

Merged
merged 6 commits into from
May 4, 2020
Merged

Added --restriction flag #154

merged 6 commits into from
May 4, 2020

Conversation

bonham79
Copy link
Contributor

@bonham79 bonham79 commented May 3, 2020

Added --restriction flag to limit 'big scrape' to individual languages. Can specify one or more languages to scrape by following flag with ISO codes of desired languages.

  • Updated Unreleased in CHANGELOG.md to reflect the changes in code or data.

@kylebgorman
Copy link
Collaborator

Cool! I'm going to add @lfashby to review; looks basically right but maybe some style issues.

Could you type args when it's passed to main? E.g.:

def main(args: argparse.Namespace) -> None:

Could you add "Closes #153" to the PR itself? That'll cause GitHub to automatically close the issue when/if this is merged.

@kylebgorman kylebgorman requested review from lfashby and jacksonllee May 3, 2020 12:08
@kylebgorman
Copy link
Collaborator

Also do add this to the CHANGELOG as detailed above, and check the check-box when you do.

@kylebgorman
Copy link
Collaborator

Probably should add a line to data/wikipron/src/README.md to indicate this feature is available and you don't need to make a copy of languages.json to run a partial scrape.

@kylebgorman
Copy link
Collaborator

  • Can you use a standard "# foo" comment style (space after #, just one per line, declarative sentence etc.)?
  • If you don't specify False as the default for --restriction you'll just get None as the default, which seems desirable. I think the relevant conditionals should still work, though.

@lfashby
Copy link
Collaborator

lfashby commented May 3, 2020

I think this looks alright. There is only one thing I would add in addition to Kyle's comments.

For the record, I'm not convinced that the big scrape should have this sort of functionality. I guess if it makes our lives easier (with respect to testing how WikiPron works on a subset of languages) then it is fine to have, but it doesn't address the currently awkward part of running a big scrape -- having to manually find the languages WikiPron failed on during the big scrape and then restricting languages.json to just those languages. In the future we could automate the the process of building a languages.json-esque object for languages that WikiPron failed on during the big scrape so we don't have to manually modify languages.json to retry failed languages.

@yeonju123 yeonju123 mentioned this pull request May 4, 2020
1 task
… --restriction arguments. Also fixed bug that would log error if argument began with whitespace. README updated to indicate for using subsets for 'big scrape.'
Copy link
Contributor Author

@bonham79 bonham79 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closes #153

@kylebgorman
Copy link
Collaborator

Two nits left then we're ready. I am looking forward to having this around.

To reflow on this project we use black -l79 scrape.py, then check flake8 scrape.py (for linting) and mypy scrape.py (for static typing errors). All three---black, flake8, and mypy---are on PyPi).

Copy link
Collaborator

@kylebgorman kylebgorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving (tho note note about out-of-date comment). Thanks for this!

@kylebgorman kylebgorman merged commit 04d1550 into CUNY-CL:master May 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants