-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorganization of cg
#396
Reorganization of cg
#396
Conversation
…es' files into 'data/phones/phones'. Moves relevant scripts from 'data/src' into 'data/phones/lib'. Updates paths in 'data/scrape/lib/codes.py'
…iption type (broad/narrow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree we may want a covering grammar generated README too, why not. It'd look cool and encourage more submitted CGs maybe.
The error analysis code and README is still here: data/src/error_analysis
.
I just ruthlessly bikesheaded your make_test_file.py
, sorry about that.
|
||
|
||
def main(args: argparse.Namespace) -> None: | ||
with open(args.gold, "r") as gf, open(args.pred, "r") as pf: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this is a plus here or not but when you're doing a lot of file-opening, contextlib's ExitStack
helps a lot: https://docs.python.org/3/library/contextlib.html#contextlib.ExitStack
I don’t have much to add. I generally prefer more verbose variable names but that's just me. After this we just have to write up these temporary TSVs.
Are you saying that those file should stay in |
No, just that pace the description they're still in-repo. I trust that y'all will place them somewhere sensible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor comments below, looking pretty good.
Agreed, that's only really useful for long-running scripts and this is not
such a script. (There may be a model of that elsewhere.)
…On Mon, Apr 5, 2021 at 11:42 PM Aidan Malanoski ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In data/covering_grammar/lib/make_test_file.py
<https://github.com/kylebgorman/wikipron/pull/396#discussion_r607473769>:
> + p_data = p_line.split("\t")
+ # Make sure that gold data and predictions have the
+ # same words
+ assert g_data[0] == p_data[0]
+ word = g_data[0]
+ # Note that we use `strip` to remove the newline
+ g_pron = g_data[1].strip()
+ p_pron = p_data[1].strip()
+ line = "\t".join([word, g_pron, p_pron])
+ print(line, file=wf)
+ except AssertionError:
+ logging.warning(f"{g_data[0]} != {p_data[0]} (line {i})")
+
+
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser(description=__doc__)
Sounds good. I think it would make sense drop the date/time from the
config though — it seems a bit overkill for this script
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<https://github.com/kylebgorman/wikipron/pull/396#discussion_r607473769>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABG4OJD6EYQCTJDQ7DQBR3THJ7JJANCNFSM4Z37AF3A>
.
|
…akes minor stylistic changes to the same file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ++
Unreleased
inCHANGELOG.md
to reflect the changes in code or data.Closes #385
This PR changes the organization of the
data/cg
directory in a similar way to #394 and #395. Changes include the following:data/cg
todata/covering_grammar
and added foldersdata/covering_grammar/lib
anddata/covering_grammar/tsv
data/src
anddata/src/error_analysis
todata/covering_grammar/lib
data/covering_grammar/lib/error_analysis.py
(for some reason, this file wasn't in WikiPron, even though it existed at the time that Arundhati added the CG stuff)data/covering_grammar
I imagine we'll eventually want the README to be a table like the ones we have for
data/phones
anddata/scrape
, and that we'll probably want some sort ofgenerate_cg_summary.py
to make that table.