Skip to content

Conversation

@karenfeng
Copy link
Collaborator

@karenfeng karenfeng commented Oct 7, 2019

What changes are proposed in this pull request?

  • Make sure tests pass with Java (not Kryo) serialization
  • Move README from md to rst to play nice with pip descriptions
  • Prune (mostly GATK) dependencies to reduce package resolution time
  • Set up pipeline to release maven artifacts to Bintray
  • Set up Python wheel creation (temporarily disabled install_requires as Databricks does not resolve from different repos, and typeguard exists in PyPi but not TestPyPi)

Instructions

  • To create a maven artifact and publish to Bintray
    • Prereq: get Bintray creds (via Lastpass)
    • Run sbt release from glow
  • To create a wheel and publish to TestPyPi
    • Prereq: update your conda environment so that you can use setuptools and twine
    • Run python setup.py bdist_wheel from glow/python
    • Run twine upload --repository-url https://test.pypi.org/legacy/ dist/*

How is this patch tested?

  • Unit tests
  • Integration tests
  • Manual tests

A DBR cluster was set up with the maven artifacts and wheel, and basic functionality was tested.

Don't use Kryo serializer

Don't parallelize un-serializable Hadoop FileStatus

Change descrip

WIP

Whoops

bintray

Not local

Quiet logs

Remove tmp file

Actually rename bintray

Setting version to 0.1.0

WIP

WIP

License fixup

Resolver

WIP

Change version

Setting version to 0.1.1

WIP

Setting version to 0.1.2

Setting version to 0.1.3-SNAPSHOT

WIP

Setting version to 0.1.2

Setting version to 0.1.3-SNAPSHOT

Exclude many GATK deps

Setting version to 0.1.3

Setting version to 0.1.4-SNAPSHOT

Setting version to 0.1.4

Setting version to 0.1.5-SNAPSHOT

Whoops

Setting version to 0.1.3

Setting version to 0.1.4-SNAPSHOT

Setting version to 0.1.4

Setting version to 0.1.5-SNAPSHOT

Setting version to 0.1.6

Setting version to 0.1.7-SNAPSHOT

Yay deps

Setting version to 0.1.7

Setting version to 0.1.8-SNAPSHOT

Setting version to 0.1.8

Setting version to 0.1.9-SNAPSHOT

Setting version to 0.1.1

Setting version to 0.1.2-SNAPSHOT

Setting version to 0.1.10

Setting version to 0.1.11-SNAPSHOT

Setting version to 0.1.15

Setting version to 0.1.16-SNAPSHOT

Setting version to 0.1.9

Setting version to 0.1.10-SNAPSHOT

WIP

Setting version to 0.1.7

Setting version to 0.1.8-SNAPSHOT

Setting version to 0.1.8

Setting version to 0.1.9-SNAPSHOT

Setting version to 0.1.7

Setting version to 0.1.8-SNAPSHOT

Add tests back

Setting version to 0.1.8

Setting version to 0.1.9-SNAPSHOT

Setting version to 0.1.7

Setting version to 0.1.8-SNAPSHOT

Setting version to 0.1.13

Setting version to 0.1.14-SNAPSHOT

WIP

Setting version to 0.1.7

Setting version to 0.1.8-SNAPSHOT

WIP

Setting version to 0.1.7

Setting version to 0.1.8-SNAPSHOT

WIP

Setting version to 0.1.8

Setting version to 0.1.9-SNAPSHOT

Setting version to 0.1.11

Setting version to 0.1.12-SNAPSHOT

Setting version to 0.1.7

Setting version to 0.1.8-SNAPSHOT

Exclude findbugs

Setting version to 0.1.8

Setting version to 0.1.9-SNAPSHOT

WIP

Cleanup
@karenfeng karenfeng requested a review from henrydavidge October 8, 2019 09:15
Copy link
Contributor

@henrydavidge henrydavidge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I had a bunch of questions.

@@ -0,0 +1,10 @@
==================================
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you migrate over the rest of the readme contents?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And delete the old one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clean everything up for publishing (PyPi uses readme.rst), I put everything in Building and Testing in the wiki.

build.sbt Outdated
.exclude("biz.k11i", "xgboost-predictor")
.exclude("com.esotericsoftware", "kryo")
.exclude("com.esotericsoftware", "reflectasm")
.exclude("com.github.fommil.netlib", "netlib-native_ref-osx-x86_64")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure we should exclude these? These are the blas bindings I believe. cc @fnothaft


// Uncomment the following for publishing to Sonatype.
// See https://www.scala-sbt.org/1.x/docs/Using-Sonatype.html for more detail.
// Publish to Bintray
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that bintray is helpful for testing, but why would we publish to bintray vs maven central? I think it's easier for users to download from maven central because they don't need to add a repo to their pom (please correct me if that's wrong).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, this is what Delta does with their OSS release - stage in bintray before pushing to Maven Central.

from setuptools import setup

setup(
name='glowpy',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just call it glow?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, both glow and pyglow have been taken on PyPi.

@karenfeng karenfeng changed the title Release prep [HLS-351] Release prep Oct 10, 2019
@karenfeng karenfeng closed this Oct 15, 2019
henrydavidge referenced this pull request in henrydavidge/glow Jun 22, 2020
* feat: ridge models for wgr added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Doc strings added for levels/functions.py
Some typos fixed in ridge_model.py
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* ridge_model and RidgeReducer unit tests added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* RidgeRegression unit tests added
test data README added
ridge_udfs.py docstrings added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Changes made to accessing the sample ID map and more docstrings

The map_normal_eqn and score_models functions previously expected the
sample IDs for a given sample block to be found in the Pandas DataFrame,
which mean we had to join them on before the .groupBy().apply().  These
functions now expect the sample block to sample IDs mapping to be
provided separately as a dict, so that the join is no longer required.
RidgeReducer and RidgeRegression APIs remain unchanged.

docstrings have been added for RidgeReducer and RidgeRegression classes.

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Refactored object names and comments to reflect new terminology

Where 'block' was previously used to refer to the set of columns in a
block, we now use 'header_block'
Where 'group' was previously used to refer to the set of samples in a
block, we now use 'sample_block'

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>
henrydavidge referenced this pull request in henrydavidge/glow Jun 22, 2020
* feat: ridge models for wgr added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Doc strings added for levels/functions.py
Some typos fixed in ridge_model.py
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* ridge_model and RidgeReducer unit tests added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* RidgeRegression unit tests added
test data README added
ridge_udfs.py docstrings added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Changes made to accessing the sample ID map and more docstrings

The map_normal_eqn and score_models functions previously expected the
sample IDs for a given sample block to be found in the Pandas DataFrame,
which mean we had to join them on before the .groupBy().apply().  These
functions now expect the sample block to sample IDs mapping to be
provided separately as a dict, so that the join is no longer required.
RidgeReducer and RidgeRegression APIs remain unchanged.

docstrings have been added for RidgeReducer and RidgeRegression classes.

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Refactored object names and comments to reflect new terminology

Where 'block' was previously used to refer to the set of columns in a
block, we now use 'header_block'
Where 'group' was previously used to refer to the set of samples in a
block, we now use 'sample_block'

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>
Signed-off-by: Henry Davidge <hhd@databricks.com>
henrydavidge added a commit that referenced this pull request Jun 22, 2020
* Add Leland's demo notebook

* block_variants_and_samples Transformer to create genotype DataFrame for WGR (#2)

* blocks

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* test vcf

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* transformer

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* remove extra

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* refactor and conform with ridge namings

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* test

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* test files

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* remove extra file

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* sort_key

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* feat: ridge models for wgr added (#1)

* feat: ridge models for wgr added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Doc strings added for levels/functions.py
Some typos fixed in ridge_model.py
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* ridge_model and RidgeReducer unit tests added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* RidgeRegression unit tests added
test data README added
ridge_udfs.py docstrings added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Changes made to accessing the sample ID map and more docstrings

The map_normal_eqn and score_models functions previously expected the
sample IDs for a given sample block to be found in the Pandas DataFrame,
which mean we had to join them on before the .groupBy().apply().  These
functions now expect the sample block to sample IDs mapping to be
provided separately as a dict, so that the join is no longer required.
RidgeReducer and RidgeRegression APIs remain unchanged.

docstrings have been added for RidgeReducer and RidgeRegression classes.

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Refactored object names and comments to reflect new terminology

Where 'block' was previously used to refer to the set of columns in a
block, we now use 'header_block'
Where 'group' was previously used to refer to the set of samples in a
block, we now use 'sample_block'

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* [HLS-539] Fix compatibility between blocked GT transformer and WGR (#6)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* existing tests pass

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename file

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add compat test

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* scalafmt

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* collect minimal columns

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* address comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Test fixup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Spark 3 needs more recent PyArrow, reduce mem consumption by removing unnecessary caching

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* PyArrow 0.15.1 only with PySpark 3

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Don't use toPandas()

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Upgrade pyarrow

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Only register once

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Minimize memory usage

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Select before head

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* set up/tear down

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Try limiting pyspark memory

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* No teardown

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Extend timeout

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Simplify ordering logic in levels code (#7)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* existing tests pass

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename file

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add compat test

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* scalafmt

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* collect minimal columns

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* start changing for readability

* use input label ordering

* rename create_row_indexer

* undo column sort

* change reduce

Signed-off-by: Henry D <henrydavidge@gmail.com>

* further simplify reduce

* sorted alpha names

* remove ordering

* comments

Signed-off-by: Henry D <henrydavidge@gmail.com>

* Set arrow env var in build

Signed-off-by: Henry D <henrydavidge@gmail.com>

* faster sort

* add test file

* undo test data change

* >=

* formatting

* empty

Co-authored-by: Karen Feng <karen.feng@databricks.com>

* Limit Spark memory conf in tests (#9)

* yapf

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* yapf transform

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Set driver memory

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Try changing spark mem

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* match java tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* whoops

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* remove driver memory flag

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Improve partitioning in block_variants_and_samples transformer (#11)

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* Remove unnecessary header_block grouping (#10)

* cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* whoops

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Create sample ID blocking helper functions (#12)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* whoops

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* simplify tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* yapf

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* index map compat

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add docs

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add more tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* pass args as ints

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Don't roll our own splitter

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename sample_index to sample_blocks

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add type-checking to WGR APIs (#14)

* Add type-checking to APIs

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Check valid alphas

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* check 0 sig

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add to install_requires list

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* cleanup comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add covariate support (#13)

* Added necessary modifications to accomodate covariates in model fitting.

The initial formulation of the WGR model assumed a form y ~ Xb, however in general we would like to use a model of the form y ~ Ca + Xb, where C is some matrix of covariates that are separate from the genomic features X.  This PR makes numerous changes to accomodate covariate matrix C.

Adding covariates required the following breaking changes to the APIs:
 * indexdf is now a required argument for RidgeReducer.transform() and RidgeRegression.transform():
   * RidgeReducer.transform(blockdf, labeldf, modeldf) -> RidgeReducer.transform(blockdf, labeldf, indexdf, modeldf)
   * RidgeRegression.transform(blockdf, labeldf, model, cvdf) -> RidgeRegression.transform(blockdf, labeldf, indexdf, model, cvdf)

Additionally, the function signatures for the fit and transform methods of RidgeReducer and RidgeRegression have all been updated to accomodate an optional covariate DataFrame as the final argument.

Two new tests have been added to test_ridge_regression.py to test run modes with covariates:
 * test_ridge_reducer_transform_with_cov
 * test_two_level_regression_with_cov

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Cleaned up one unnecessary Pandas import
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Small changes for clarity and consistence with the rest of the code.
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Forgot one usage of coalesce
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Added a couple of comments to explain logic and replaced usages of .values with .array
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Fixed one instance of the change .values -> .array where it was made in error.
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Typo in test_ridge_regression.py.
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Style auto-updates with yapfAll
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

Co-authored-by: Leland Barnard <leland.barnard@regeneron.com>
Co-authored-by: Karen Feng <karen.feng@databricks.com>

* Flatten estimated phenotypes (#15)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Clean up tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Order to match labeldf

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Check we tie-break

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* test var name

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* clean up tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Clean up docs

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add fit_transform function to models (#17)

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Rename levels (#20)

* Rename levels to wgr

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename test files

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add license headers (#21)

* headers

* executable

* fix template rendering

* yapf

* add header to template

* add header to template

Signed-off-by: Henry D <henrydavidge@gmail.com>

Co-authored-by: Kiavash Kianfar <kiavash.kianfar@databricks.com>
Co-authored-by: Karen Feng <karen.feng@databricks.com>
Co-authored-by: Leland <leland.barnard@gmail.com>
Co-authored-by: Leland Barnard <leland.barnard@regeneron.com>
henrydavidge added a commit that referenced this pull request Jun 22, 2020
* Add Leland's demo notebook

* block_variants_and_samples Transformer to create genotype DataFrame for WGR (#2)

* blocks

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* test vcf

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* transformer

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* remove extra

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* refactor and conform with ridge namings

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* test

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* test files

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* remove extra file

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* sort_key

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* feat: ridge models for wgr added (#1)

* feat: ridge models for wgr added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Doc strings added for levels/functions.py
Some typos fixed in ridge_model.py
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* ridge_model and RidgeReducer unit tests added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* RidgeRegression unit tests added
test data README added
ridge_udfs.py docstrings added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Changes made to accessing the sample ID map and more docstrings

The map_normal_eqn and score_models functions previously expected the
sample IDs for a given sample block to be found in the Pandas DataFrame,
which mean we had to join them on before the .groupBy().apply().  These
functions now expect the sample block to sample IDs mapping to be
provided separately as a dict, so that the join is no longer required.
RidgeReducer and RidgeRegression APIs remain unchanged.

docstrings have been added for RidgeReducer and RidgeRegression classes.

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Refactored object names and comments to reflect new terminology

Where 'block' was previously used to refer to the set of columns in a
block, we now use 'header_block'
Where 'group' was previously used to refer to the set of samples in a
block, we now use 'sample_block'

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* [HLS-539] Fix compatibility between blocked GT transformer and WGR (#6)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* existing tests pass

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename file

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add compat test

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* scalafmt

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* collect minimal columns

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* address comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Test fixup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Spark 3 needs more recent PyArrow, reduce mem consumption by removing unnecessary caching

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* PyArrow 0.15.1 only with PySpark 3

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Don't use toPandas()

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Upgrade pyarrow

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Only register once

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Minimize memory usage

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Select before head

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* set up/tear down

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Try limiting pyspark memory

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* No teardown

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Extend timeout

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Simplify ordering logic in levels code (#7)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* existing tests pass

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename file

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add compat test

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* scalafmt

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* collect minimal columns

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* start changing for readability

* use input label ordering

* rename create_row_indexer

* undo column sort

* change reduce

Signed-off-by: Henry D <henrydavidge@gmail.com>

* further simplify reduce

* sorted alpha names

* remove ordering

* comments

Signed-off-by: Henry D <henrydavidge@gmail.com>

* Set arrow env var in build

Signed-off-by: Henry D <henrydavidge@gmail.com>

* faster sort

* add test file

* undo test data change

* >=

* formatting

* empty

Co-authored-by: Karen Feng <karen.feng@databricks.com>

* Limit Spark memory conf in tests (#9)

* yapf

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* yapf transform

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Set driver memory

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Try changing spark mem

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* match java tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* whoops

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* remove driver memory flag

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Improve partitioning in block_variants_and_samples transformer (#11)

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* Remove unnecessary header_block grouping (#10)

* cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* whoops

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Create sample ID blocking helper functions (#12)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* whoops

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* simplify tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* yapf

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* index map compat

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add docs

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add more tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* pass args as ints

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Don't roll our own splitter

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename sample_index to sample_blocks

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add type-checking to WGR APIs (#14)

* Add type-checking to APIs

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Check valid alphas

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* check 0 sig

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add to install_requires list

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* cleanup comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add covariate support (#13)

* Added necessary modifications to accomodate covariates in model fitting.

The initial formulation of the WGR model assumed a form y ~ Xb, however in general we would like to use a model of the form y ~ Ca + Xb, where C is some matrix of covariates that are separate from the genomic features X.  This PR makes numerous changes to accomodate covariate matrix C.

Adding covariates required the following breaking changes to the APIs:
 * indexdf is now a required argument for RidgeReducer.transform() and RidgeRegression.transform():
   * RidgeReducer.transform(blockdf, labeldf, modeldf) -> RidgeReducer.transform(blockdf, labeldf, indexdf, modeldf)
   * RidgeRegression.transform(blockdf, labeldf, model, cvdf) -> RidgeRegression.transform(blockdf, labeldf, indexdf, model, cvdf)

Additionally, the function signatures for the fit and transform methods of RidgeReducer and RidgeRegression have all been updated to accomodate an optional covariate DataFrame as the final argument.

Two new tests have been added to test_ridge_regression.py to test run modes with covariates:
 * test_ridge_reducer_transform_with_cov
 * test_two_level_regression_with_cov

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Cleaned up one unnecessary Pandas import
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Small changes for clarity and consistence with the rest of the code.
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Forgot one usage of coalesce
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Added a couple of comments to explain logic and replaced usages of .values with .array
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Fixed one instance of the change .values -> .array where it was made in error.
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Typo in test_ridge_regression.py.
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Style auto-updates with yapfAll
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

Co-authored-by: Leland Barnard <leland.barnard@regeneron.com>
Co-authored-by: Karen Feng <karen.feng@databricks.com>

* Flatten estimated phenotypes (#15)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Clean up tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Order to match labeldf

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Check we tie-break

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* test var name

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* clean up tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Clean up docs

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add fit_transform function to models (#17)

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* support alpha inference

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* test fixup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* more test fixup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* test fixups

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* sub-sample

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* test fixup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* address comments - only infer alphas during fit

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* exception varies

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Rename levels (#20)

* Rename levels to wgr

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename test files

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Errors vary by Spark version

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add license headers (#21)

* headers

* executable

* fix template rendering

* yapf

Co-authored-by: Kiavash Kianfar <kiavash.kianfar@databricks.com>
Co-authored-by: Karen Feng <karen.feng@databricks.com>
Co-authored-by: Leland <leland.barnard@gmail.com>
Co-authored-by: Leland Barnard <leland.barnard@regeneron.com>
karenfeng added a commit that referenced this pull request Jun 23, 2020
* Add Leland's demo notebook

* block_variants_and_samples Transformer to create genotype DataFrame for WGR (#2)

* blocks

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* test vcf

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* transformer

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* remove extra

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* refactor and conform with ridge namings

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* test

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* test files

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* remove extra file

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* sort_key

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* feat: ridge models for wgr added (#1)

* feat: ridge models for wgr added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Doc strings added for levels/functions.py
Some typos fixed in ridge_model.py
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* ridge_model and RidgeReducer unit tests added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* RidgeRegression unit tests added
test data README added
ridge_udfs.py docstrings added
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Changes made to accessing the sample ID map and more docstrings

The map_normal_eqn and score_models functions previously expected the
sample IDs for a given sample block to be found in the Pandas DataFrame,
which mean we had to join them on before the .groupBy().apply().  These
functions now expect the sample block to sample IDs mapping to be
provided separately as a dict, so that the join is no longer required.
RidgeReducer and RidgeRegression APIs remain unchanged.

docstrings have been added for RidgeReducer and RidgeRegression classes.

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Refactored object names and comments to reflect new terminology

Where 'block' was previously used to refer to the set of columns in a
block, we now use 'header_block'
Where 'group' was previously used to refer to the set of samples in a
block, we now use 'sample_block'

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* [HLS-539] Fix compatibility between blocked GT transformer and WGR (#6)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* existing tests pass

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename file

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add compat test

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* scalafmt

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* collect minimal columns

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* address comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Test fixup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Spark 3 needs more recent PyArrow, reduce mem consumption by removing unnecessary caching

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* PyArrow 0.15.1 only with PySpark 3

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Don't use toPandas()

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Upgrade pyarrow

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Only register once

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Minimize memory usage

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Select before head

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* set up/tear down

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Try limiting pyspark memory

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* No teardown

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Extend timeout

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Simplify ordering logic in levels code (#7)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* existing tests pass

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename file

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add compat test

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* scalafmt

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* collect minimal columns

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* start changing for readability

* use input label ordering

* rename create_row_indexer

* undo column sort

* change reduce

Signed-off-by: Henry D <henrydavidge@gmail.com>

* further simplify reduce

* sorted alpha names

* remove ordering

* comments

Signed-off-by: Henry D <henrydavidge@gmail.com>

* Set arrow env var in build

Signed-off-by: Henry D <henrydavidge@gmail.com>

* faster sort

* add test file

* undo test data change

* >=

* formatting

* empty

Co-authored-by: Karen Feng <karen.feng@databricks.com>

* Limit Spark memory conf in tests (#9)

* yapf

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* yapf transform

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Set driver memory

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Try changing spark mem

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* match java tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* whoops

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* remove driver memory flag

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Improve partitioning in block_variants_and_samples transformer (#11)

Signed-off-by: kianfar77 <kiavash.kianfar@databricks.com>

* Remove unnecessary header_block grouping (#10)

* cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* whoops

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Create sample ID blocking helper functions (#12)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* whoops

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* simplify tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* yapf

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* index map compat

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add docs

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add more tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* pass args as ints

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Don't roll our own splitter

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename sample_index to sample_blocks

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add type-checking to WGR APIs (#14)

* Add type-checking to APIs

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Check valid alphas

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* check 0 sig

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add to install_requires list

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* cleanup comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add covariate support (#13)

* Added necessary modifications to accomodate covariates in model fitting.

The initial formulation of the WGR model assumed a form y ~ Xb, however in general we would like to use a model of the form y ~ Ca + Xb, where C is some matrix of covariates that are separate from the genomic features X.  This PR makes numerous changes to accomodate covariate matrix C.

Adding covariates required the following breaking changes to the APIs:
 * indexdf is now a required argument for RidgeReducer.transform() and RidgeRegression.transform():
   * RidgeReducer.transform(blockdf, labeldf, modeldf) -> RidgeReducer.transform(blockdf, labeldf, indexdf, modeldf)
   * RidgeRegression.transform(blockdf, labeldf, model, cvdf) -> RidgeRegression.transform(blockdf, labeldf, indexdf, model, cvdf)

Additionally, the function signatures for the fit and transform methods of RidgeReducer and RidgeRegression have all been updated to accomodate an optional covariate DataFrame as the final argument.

Two new tests have been added to test_ridge_regression.py to test run modes with covariates:
 * test_ridge_reducer_transform_with_cov
 * test_two_level_regression_with_cov

Signed-off-by: Leland Barnard (leland.barnard@gmail.com)
Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Cleaned up one unnecessary Pandas import
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Small changes for clarity and consistence with the rest of the code.
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Forgot one usage of coalesce
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Added a couple of comments to explain logic and replaced usages of .values with .array
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Fixed one instance of the change .values -> .array where it was made in error.
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Typo in test_ridge_regression.py.
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

* Style auto-updates with yapfAll
Signed-off-by: Leland Barnard (leland.barnard@gmail.com)

Signed-off-by: Leland Barnard <leland.barnard@regeneron.com>

Co-authored-by: Leland Barnard <leland.barnard@regeneron.com>
Co-authored-by: Karen Feng <karen.feng@databricks.com>

* Flatten estimated phenotypes (#15)

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Clean up tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Order to match labeldf

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Check we tie-break

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* test var name

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* clean up tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Clean up docs

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* remove accidental files

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add fit_transform function to models (#17)

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Rename levels (#20)

* Rename levels to wgr

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* rename test files

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Add license headers (#21)

* headers

* executable

* fix template rendering

* yapf

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* WIP

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* More work

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* More cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* Fix docs tests

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* address comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* fix regression fit description

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* fix capitalization

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* address some comments

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* more cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* More cleanup

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* add notebook

Signed-off-by: Karen Feng <karen.feng@databricks.com>

* update notebook

Signed-off-by: Karen Feng <karen.feng@databricks.com>

Co-authored-by: Henry D <henrydavidge@gmail.com>
Co-authored-by: Kiavash Kianfar <kiavash.kianfar@databricks.com>
Co-authored-by: Leland <leland.barnard@gmail.com>
Co-authored-by: Leland Barnard <leland.barnard@regeneron.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants