-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can hub paper #2379
base: main
Are you sure you want to change the base?
Can hub paper #2379
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2379 +/- ##
==========================================
- Coverage 89.42% 88.94% -0.48%
==========================================
Files 153 153
Lines 12566 12566
==========================================
- Hits 11237 11177 -60
- Misses 1329 1389 +60 ☔ View full report in Codecov by Sentry. |
If possible, could you separate this out into 4 PRs:
Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM overall! Just one major comment about switching the new minified mode to a new argument and some minor comments about performance/storage.
Since there's a lot of conflicts w/ the main branch, I can take care of transferring this to another PR and tagging you as co-author.
def minify_adata( | ||
self, | ||
minified_data_type: MinifiedDataType = ADATA_MINIFY_TYPE.LATENT_POSTERIOR, | ||
use_latent_qzm_key: str = "X_latent_qzm", | ||
use_latent_qzv_key: str = "X_latent_qzv", | ||
): | ||
"""Minifies the model's adata. | ||
|
||
Minifies the adata, and registers new anndata fields: latent qzm, latent qzv, adata uns | ||
containing minified-adata type, and library size. | ||
This also sets the appropriate property on the module to indicate that the adata is minified. | ||
|
||
Parameters | ||
---------- | ||
minified_data_type | ||
How to minify the data. Currently only supports `latent_posterior_parameters` and `add_posterior_parameters`,. | ||
If minified_data_type == `latent_posterior_parameters`: | ||
|
||
* the original count data is removed (`adata.X`, adata.raw, and any layers) | ||
* the parameters of the latent representation of the original data is stored | ||
* everything else is left untouched | ||
If minified_data_type == `add_posterior_parameters`: | ||
|
||
* the original count data is kept (`adata.X`, adata.raw, and any layers) | ||
* the parameters of the latent representation of the original data is stored | ||
* everything else is left untouched | ||
use_latent_qzm_key | ||
Key to use in `adata.obsm` where the latent qzm params are stored | ||
use_latent_qzv_key | ||
Key to use in `adata.obsm` where the latent qzv params are stored | ||
|
||
Notes | ||
----- | ||
The modification is not done inplace -- instead the model is assigned a new (minified) | ||
version of the adata. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, the only difference between "latent_posterior_parameters"
and "add_posterior_parameters"
is that the count data is preserved in the latter.
I'm thinking it might be less confusing for the end user if, instead of specifying a minified_data_type
, we can directly let them pass in keep_count_data
to distinguish between these two modes. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Will be cleaner. Inititally the idea was to have more modes. However, I can't imagine a different mode.
version of the adata. | ||
""" | ||
# TODO(adamgayoso): Add support for a scenario where we want to cache the latent posterior | ||
if not ADATA_MINIFY_TYPE.__contains__(minified_data_type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason this part uses __contains__
instead of a minified_data_type in ADATA_MINIFY_TYPE
? I think the latter is more readable and should work with the NamedTuple
.
@@ -481,6 +490,103 @@ def _check_if_trained( | |||
else: | |||
raise RuntimeError(message) | |||
|
|||
def minify_adata( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the plan is to put this method under BaseModelClass
, we should also move over the other methods under BaseMinifiedModeModelClass
and then delete it. What do you think?
all_zeros = csr_matrix(adata.X.shape) | ||
layers = {layer: all_zeros.copy() for layer in adata.layers} | ||
else: | ||
all_zeros = adata.X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on checking whether adata.X
is a sparse matrix and if not, sparsify it (in the spirit of minification)?
layers = {layer: all_zeros.copy() for layer in adata.layers} | ||
else: | ||
all_zeros = adata.X | ||
layers = adata.layers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may also make sense to only copy the count layer registered with the manager so we don't have to carry around the unused layers as part of the minified data.
Adds minified mode to CondSCVI. Simplifies minification for new models. Adds add_latent_posterior mode which keeps count data. Fixes glitches in DE function.
docs/release_notes/index.md
file if fixing a bug or adding a new feature