Improve GWAS documentation by karenfeng · Pull Request #217 · projectglow/glow

karenfeng · 2020-05-19T22:53:22Z

What changes are proposed in this pull request?

Changes the GWAS documentation to read covariates and phenotypes from CSVs in order to better match a realistic workflow and document that the user can pass literal numpy arrays as a result of #213.

Due to incompatibility between the most recent release of PySpark 3.0 and the current Spark 3.0.1 snapshot that broke Pandas functionality, I also modified the build to create our own pip-installable .tar.gz off the latest Spark branch-3.0. This required us to create our own setup.py file, as the existing one in Spark requires jars

This change necessitated enabling the env var PYSPARK_ROW_FIELD_SORTING_ENABLED due to the changes in apache/spark#26496, which modified the row ordering and broke the Python tests when the row elements were unsorted.

How is this patch tested?

Unit tests
Integration tests
Manual tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

codecov · 2020-05-19T23:08:46Z

Codecov Report

Merging #217 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #217   +/-   ##
=======================================
  Coverage   93.63%   93.63%           
=======================================
  Files          88       88           
  Lines        4258     4258           
  Branches      397      397           
=======================================
  Hits         3987     3987           
  Misses        271      271

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ec7da93...c167cc2. Read the comment docs.

…ument-multiple-phenos Signed-off-by: Karen Feng <karen.feng@databricks.com>

Signed-off-by: Karen Feng <karen.feng@databricks.com>

…ument-multiple-phenos Signed-off-by: Karen Feng <karen.feng@databricks.com>

Signed-off-by: Karen Feng <karen.feng@databricks.com>

henrydavidge

Small comments

docs/source/tertiary/regression-tests.rst

Signed-off-by: Karen Feng <karen.feng@databricks.com>

henrydavidge

LGTM

Signed-off-by: Karen Feng <karen.feng@databricks.com>

henrydavidge

LGTM

* Improve GWAS docs Signed-off-by: Karen Feng <karen.feng@databricks.com> * Compress Signed-off-by: Karen Feng <karen.feng@databricks.com> * Gzip Signed-off-by: Karen Feng <karen.feng@databricks.com> * Intercept Signed-off-by: Karen Feng <karen.feng@databricks.com> * lit Signed-off-by: Karen Feng <karen.feng@databricks.com> * fixup binary phenos Signed-off-by: Karen Feng <karen.feng@databricks.com> * Add tip Signed-off-by: Karen Feng <karen.feng@databricks.com> * lit Signed-off-by: Karen Feng <karen.feng@databricks.com> * comments Signed-off-by: Karen Feng <karen.feng@databricks.com> * Set up PySpark from github Signed-off-by: Karen Feng <karen.feng@databricks.com> * clone Signed-off-by: Karen Feng <karen.feng@databricks.com> * Don't specify rev Signed-off-by: Karen Feng <karen.feng@databricks.com> * Docs fixup Signed-off-by: Karen Feng <karen.feng@databricks.com> * Move setup Signed-off-by: Karen Feng <karen.feng@databricks.com> * ordering of struct Signed-off-by: Karen Feng <karen.feng@databricks.com> * enable legacy conf PYSPARK_ROW_FIELD_SORTING_ENABLED Signed-off-by: Karen Feng <karen.feng@databricks.com> Signed-off-by: Henry Davidge <hhd@databricks.com>

karenfeng added 4 commits May 19, 2020 15:46

Improve GWAS docs

d0fc911

Signed-off-by: Karen Feng <karen.feng@databricks.com>

Compress

2b0fb80

Signed-off-by: Karen Feng <karen.feng@databricks.com>

Gzip

d5d527c

Signed-off-by: Karen Feng <karen.feng@databricks.com>

Intercept

3124717

Signed-off-by: Karen Feng <karen.feng@databricks.com>

karenfeng requested a review from henrydavidge May 19, 2020 22:53

karenfeng added 7 commits May 29, 2020 11:38

Merge branch 'master' of https://github.com/projectglow/glow into doc…

85f36d8

…ument-multiple-phenos Signed-off-by: Karen Feng <karen.feng@databricks.com>

lit

2851a6c

Signed-off-by: Karen Feng <karen.feng@databricks.com>

Merge branch 'master' of https://github.com/projectglow/glow into doc…

12aea6e

…ument-multiple-phenos Signed-off-by: Karen Feng <karen.feng@databricks.com>

Merge branch 'master' of https://github.com/projectglow/glow into doc…

da27f72

…ument-multiple-phenos Signed-off-by: Karen Feng <karen.feng@databricks.com>

fixup binary phenos

3c053f7

Signed-off-by: Karen Feng <karen.feng@databricks.com>

Add tip

14b522a

Signed-off-by: Karen Feng <karen.feng@databricks.com>

lit

d20ec65

Signed-off-by: Karen Feng <karen.feng@databricks.com>

henrydavidge reviewed Jun 4, 2020

View reviewed changes

docs/source/tertiary/regression-tests.rst Outdated Show resolved Hide resolved

docs/source/tertiary/regression-tests.rst Outdated Show resolved Hide resolved

comments

efc2e00

Signed-off-by: Karen Feng <karen.feng@databricks.com>

henrydavidge approved these changes Jun 4, 2020

View reviewed changes

karenfeng added 7 commits June 4, 2020 12:43

Set up PySpark from github

6dc270b

Signed-off-by: Karen Feng <karen.feng@databricks.com>

clone

cb3b75d

Signed-off-by: Karen Feng <karen.feng@databricks.com>

Don't specify rev

51e4ba3

Signed-off-by: Karen Feng <karen.feng@databricks.com>

Docs fixup

88db6c9

Signed-off-by: Karen Feng <karen.feng@databricks.com>

Move setup

0f95ac0

Signed-off-by: Karen Feng <karen.feng@databricks.com>

ordering of struct

45a9cee

Signed-off-by: Karen Feng <karen.feng@databricks.com>

enable legacy conf PYSPARK_ROW_FIELD_SORTING_ENABLED

c167cc2

Signed-off-by: Karen Feng <karen.feng@databricks.com>

henrydavidge approved these changes Jun 6, 2020

View reviewed changes

karenfeng merged commit 22f1a90 into projectglow:master Jun 8, 2020

karenfeng deleted the document-multiple-phenos branch June 8, 2020 21:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Improve GWAS documentation#217

Improve GWAS documentation#217
karenfeng merged 19 commits intoprojectglow:masterfrom
karenfeng:document-multiple-phenos

karenfeng commented May 19, 2020 •

edited

Loading

Uh oh!

codecov bot commented May 19, 2020 •

edited

Loading

Uh oh!

henrydavidge left a comment

Uh oh!

Uh oh!

Uh oh!

henrydavidge left a comment

Uh oh!

henrydavidge left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

karenfeng commented May 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes are proposed in this pull request?

How is this patch tested?

Uh oh!

codecov bot commented May 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

henrydavidge left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

henrydavidge left a comment

Choose a reason for hiding this comment

Uh oh!

henrydavidge left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

karenfeng commented May 19, 2020 •

edited

Loading

codecov bot commented May 19, 2020 •

edited

Loading