Tissue-Specific Genotype Expression Project using the GTEx Dataset

Team members: Nico Chaves, Noam Weinberger and Junjie (Jason) Zhu

We began this project in Spring 2016 as a course project for Stanford CS 341 (Project in Mining Massive Datasets).

Structure

/data: includes metadata of processed data and example datasets; full datasets are stored on the server

/preprocessing: includes python scripts that used to process the expression data downloaded from GTExPortal

/ipython_notebook: includes ipython notebooks used to display main results of this project in an interactive fashion

Instructions

git pull git add --all git commit -m "MESSAGE" git push

Data Preprocessing

We downloaded the Transcript RPKM file (GTEx_Analysis_v6_RNA-seq_Flux1.6_transcript_rpkm.txt.gz) and meta-information (GTEx_Data_V6_Annotations_SampleAttributesDS.txt) from GTExPortal where the former includes expression values and the latter includes information about donor IDs and tissue types of each sample. Then we filtered out transcripts according to the following procedure:

Select transcripts that are mapped to genes in the GO database (list downloaded from Ensembl Biomart)
Select top 10,000 transcripts with the highest variance across all samples in this dataset

	Downloaded Transcript RPKM	After GO Term Filter	After Variance Filter
Number of Variables	195,747	67,344	10,000

TODO: Write usage instructions

Contributing

Fork it!
Create your feature branch: git checkout -b my-new-feature
Commit your changes: git commit -am 'Add some feature'
Push to the branch: git push origin my-new-feature
Submit a pull request :D

History

TODO: Write history

Credits

TODO: Write credits

License

TODO: Write license

Name		Name	Last commit message	Last commit date
Latest commit History 258 Commits
GO_prediction		GO_prediction
aws_mock		aws_mock
csv_output_files		csv_output_files
data		data
group_lasso		group_lasso
ipython_notebook		ipython_notebook
plots		plots
preprocessing		preprocessing
.gitignore		.gitignore
README.md		README.md
predict_genes_to_GO_process_old.py		predict_genes_to_GO_process_old.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tissue-Specific Genotype Expression Project using the GTEx Dataset

Structure

Instructions

Data Preprocessing

Contributing

History

Credits

License

About

Releases

Packages

Contributors 4

Languages

nmchaves/tissue-specific-expression

Folders and files

Latest commit

History

Repository files navigation

Tissue-Specific Genotype Expression Project using the GTEx Dataset

Structure

Instructions

Data Preprocessing

Contributing

History

Credits

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages