-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added make_mt_IHME.py to convert IHME flat files with ICD codes and morbidity data into mt and ht. #3
base: master
Are you sure you want to change the base?
Conversation
…orbidity data into mt and ht.
'must be present in the MatrixTable column fields.') | ||
|
||
|
||
parser = argparse.ArgumentParser() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always put this inside the if __name__ == '__main__'
block. It probably doesn't matter, but its not really needed if you were to import from this script, so I think it makes sense there.
|
||
|
||
parser = argparse.ArgumentParser() | ||
parser.add_argument('--input', type=str, help='Location of the table containing the data for merging. Expects this to contain data for a single year.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type=str
is the default. fine to leave it if you want, but you can also omit
entry_fields: list, field_names: list): | ||
if not row_key in field_names: | ||
raise ValueError('Row key must be in set of fields.') | ||
if(len(row_other) == 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need the outer parens here
row_other_l = row_other.split(',') | ||
if not all([row in field_names for row in row_other_l]): | ||
raise ValueError('All row fields must be in the columns of the read table.') | ||
return row_key, list(np.setdiff1d(list(set(row_other_l)),[row_key])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't modify row_key
- is there a compelling reason to return it?
|
||
ht : Hail Table | ||
Should contain columns that will be transformed into the new struct. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide a simple pictoral example of what this function does? Hard to envision from just the docstring
age_groups = list(set(age_groups)) | ||
age_groups_not_in_vals = np.setdiff1d(age_groups,age_vals) | ||
if len(age_groups_not_in_vals) == 0: | ||
mt_fc = mt_f.filter_cols(hl.literal(age_groups).contains(mt_f.age_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is mt_f
?
args = parser.parse_args() | ||
|
||
# Import | ||
ht = hl.import_table(args.input, force=True, impute=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're making an assumption that this is gzipped, then you must have a particular file in mind. If so, I would suggest making an entry in resources/phenotypes.py
like one of the other resources and read it from there rather than taking it as an input arg
mt = ht_m.to_matrix_table(row_key=[row_key], row_fields=row_other_fields, | ||
col_key=[column_key], col_fields=column_other_fields, | ||
n_partitions=100) | ||
mt = mt.annotate_globals(year=args.year) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this for tracking? Is the date in the filename? If so, filename might be better than a user-specified year - if a user picks a new file they might not notice this argument and not update it
mt_f = mt.annotate_rows(**modifications) | ||
|
||
# Output MatrixTable | ||
mt_f.write(args.output, overwrite=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parameterize an args.overwrite
(action='store_true'
)
Upstream code to generate the compressed flat file is R code that I still have locally, and need to clean up.