Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added make_mt_IHME.py to convert IHME flat files with ICD codes and morbidity data into mt and ht. #3

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rahulg603
Copy link

Upstream code to generate the compressed flat file is R code that I still have locally, and need to clean up.

'must be present in the MatrixTable column fields.')


parser = argparse.ArgumentParser()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always put this inside the if __name__ == '__main__' block. It probably doesn't matter, but its not really needed if you were to import from this script, so I think it makes sense there.



parser = argparse.ArgumentParser()
parser.add_argument('--input', type=str, help='Location of the table containing the data for merging. Expects this to contain data for a single year.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type=str is the default. fine to leave it if you want, but you can also omit

entry_fields: list, field_names: list):
if not row_key in field_names:
raise ValueError('Row key must be in set of fields.')
if(len(row_other) == 0):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need the outer parens here

row_other_l = row_other.split(',')
if not all([row in field_names for row in row_other_l]):
raise ValueError('All row fields must be in the columns of the read table.')
return row_key, list(np.setdiff1d(list(set(row_other_l)),[row_key]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't modify row_key - is there a compelling reason to return it?


ht : Hail Table
Should contain columns that will be transformed into the new struct.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a simple pictoral example of what this function does? Hard to envision from just the docstring

age_groups = list(set(age_groups))
age_groups_not_in_vals = np.setdiff1d(age_groups,age_vals)
if len(age_groups_not_in_vals) == 0:
mt_fc = mt_f.filter_cols(hl.literal(age_groups).contains(mt_f.age_name))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is mt_f?

args = parser.parse_args()

# Import
ht = hl.import_table(args.input, force=True, impute=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're making an assumption that this is gzipped, then you must have a particular file in mind. If so, I would suggest making an entry in resources/phenotypes.py like one of the other resources and read it from there rather than taking it as an input arg

mt = ht_m.to_matrix_table(row_key=[row_key], row_fields=row_other_fields,
col_key=[column_key], col_fields=column_other_fields,
n_partitions=100)
mt = mt.annotate_globals(year=args.year)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for tracking? Is the date in the filename? If so, filename might be better than a user-specified year - if a user picks a new file they might not notice this argument and not update it

mt_f = mt.annotate_rows(**modifications)

# Output MatrixTable
mt_f.write(args.output, overwrite=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameterize an args.overwrite (action='store_true')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants