Skip to content

Comments

Improve GWAS documentation#217

Merged
karenfeng merged 19 commits intoprojectglow:masterfrom
karenfeng:document-multiple-phenos
Jun 8, 2020
Merged

Improve GWAS documentation#217
karenfeng merged 19 commits intoprojectglow:masterfrom
karenfeng:document-multiple-phenos

Conversation

@karenfeng
Copy link
Collaborator

@karenfeng karenfeng commented May 19, 2020

What changes are proposed in this pull request?

Changes the GWAS documentation to read covariates and phenotypes from CSVs in order to better match a realistic workflow and document that the user can pass literal numpy arrays as a result of #213.

Due to incompatibility between the most recent release of PySpark 3.0 and the current Spark 3.0.1 snapshot that broke Pandas functionality, I also modified the build to create our own pip-installable .tar.gz off the latest Spark branch-3.0. This required us to create our own setup.py file, as the existing one in Spark requires jars

This change necessitated enabling the env var PYSPARK_ROW_FIELD_SORTING_ENABLED due to the changes in apache/spark#26496, which modified the row ordering and broke the Python tests when the row elements were unsorted.

How is this patch tested?

  • Unit tests
  • Integration tests
  • Manual tests

karenfeng added 4 commits May 19, 2020 15:46
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
@karenfeng karenfeng requested a review from henrydavidge May 19, 2020 22:53
@codecov
Copy link

codecov bot commented May 19, 2020

Codecov Report

Merging #217 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #217   +/-   ##
=======================================
  Coverage   93.63%   93.63%           
=======================================
  Files          88       88           
  Lines        4258     4258           
  Branches      397      397           
=======================================
  Hits         3987     3987           
  Misses        271      271           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ec7da93...c167cc2. Read the comment docs.

karenfeng added 7 commits May 29, 2020 11:38
…ument-multiple-phenos

Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
…ument-multiple-phenos

Signed-off-by: Karen Feng <karen.feng@databricks.com>
…ument-multiple-phenos

Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Copy link
Contributor

@henrydavidge henrydavidge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>
Copy link
Contributor

@henrydavidge henrydavidge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

karenfeng added 7 commits June 4, 2020 12:43
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Copy link
Contributor

@henrydavidge henrydavidge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@karenfeng karenfeng merged commit 22f1a90 into projectglow:master Jun 8, 2020
@karenfeng karenfeng deleted the document-multiple-phenos branch June 8, 2020 21:23
henrydavidge pushed a commit to henrydavidge/glow that referenced this pull request Jun 22, 2020
* Improve GWAS docs

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Compress

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Gzip

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Intercept

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* lit

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* fixup binary phenos

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add tip

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* lit

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Set up PySpark from github

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* clone

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Don't specify rev

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Docs fixup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Move setup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* ordering of struct

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* enable legacy conf PYSPARK_ROW_FIELD_SORTING_ENABLED

Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Henry Davidge <hhd@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants