-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Downsample total counts #474
Conversation
3efb194
to
fc84096
Compare
This changes the API to `sc.pp.dowsample_counts` to allow downsample on both a per-cell and whole dataset level. Downsampling the whole count matrix is done with the argument `"total_counts"` while downsampling cells is done with `"counts_per_cell"`.
* Internal `_downsample*` functions take the count matrix instead of the AnnData object * Added caching to `_downsample_array` (speeds tests up a lot) * Doc updates * Made it so original sparse matrix type is returned for downsampling total counts * Eliminate zeros after downsampling * Minor miscellaneous cleanup/ formatting
Example usage: ```python >>> @deprecated_arg_names({"b": "c"}) def foo(a, c=2): return a + c >>> foo(a, b=1) __main__:1: DeprecationWarning: Keyword argument 'b' has been deprecated in favour of 'c'. 'b' will be removed in a future version. 2 ```
Made it so `sc.pp.downsample_counts` will still run if keyword argument "downsample_counts" is used, but will throw a deprecation warning.
91601d2
to
f318741
Compare
I've added a decorator for renaming arguments, Decorator example usage: @deprecated_arg_names({"b": "c"})
def foo(a, c=2):
return a + c
foo(a, b=1)
# __main__:1: DeprecationWarning: Keyword argument 'b' has been deprecated in favour of 'c'. 'b' will be removed in a future version.
# 2 |
Hi! I’ve wanted to introduce https://github.com/flying-sheep/legacy-api-wrap for some time. do you think that would be sufficient or should we take the deprecation of kwargs into account? There’s also laurent-laporte-pro/deprecated#8 |
Right, I'd also suspect that a non-scanpy-custom solution for deprecated arguments is the way to go. For now, this is fine; we can replace it if we transition there. If @flying-sheep would go about replacing this with |
@flying-sheep I think changing the name of a keyword argument is specific enough to warrant it's own functionality
As far as I can tell, @falexwolf If there was a commonly used, high quality library which provided tools for handling deprecations like this I wouldn't mind switching over to that. After some light googling, I didn't find one. Ultimately, it's a pretty trivial decorator, and I don't think it's adding much complexity here. |
I completely get your point, @ivirshup. |
OK, so now the question is: should this become part of legacy-api-wrap? I’d rather have the API fixed once than using multiple decorators. I think It’s clearer to see what the new API is like if you don’t have to think about the order of multiple decorators being applied. Also, I think @renamed_args(new="old") feels more natural than @deprecated_arg_names({"old": "new"}) |
On I'm not too worried about the api for this decorator if it's just in scanpy. Since it'd be only meant for internal use there aren't any promises about the api that should be kept. It also means the issue of multiple decorators can be dealt with when it occurs, and an example case could help guide the decision. I think that I prefer passing a dict to using |
I agree with all you said 😄 |
Great! Is this all good to merge? |
@ivirshup, you mentioned it's a backwards compat breaking change; that's why I put it on the list of those changes. But I agree that the break is so little that we can also merge it now. It might take a while until the rest gets done. So, I'm happy to go ahead and merge. |
Right, there were two changes, the bigger one I've fixed with that decorator. Since the smaller breaking change is just a change of default bool, I could split that out, but I'm not sure how you want to handle that. If there was a |
|
||
Cells with fewer counts than `target_counts` are unaffected by this. This | ||
has been implemented by M. D. Luecken. | ||
If `counts_per_cell` in specified, each cell will downsampled. If |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a typo in: If counts_per_cell
is specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, should definitely be is
. I've gotta start proofreading my docs better.
Thanks for clarifying this again, @ivirshup! We should have changed the default value already for 1.4. I'll add a note to the release notes and it's fine... ;) |
PS: @ivirshup, I tried writing you an email recently. Did you get it? |
Downsample total counts
Update to
downsample_counts
to allow downsampling total counts, similar to normalization bycellranger aggr
(I'm pretty sure on this, there's a lot going on in their code). Additionally, enabled caching for thenumba
'd function, which cuts down on test time.As adding this feature meant renaming
target_counts
tocounts_per_cell
, this becomes a breaking change. Since it's breaking, I've also gone ahead and setreplace=False
by default as mentioned before (#340).Definitely willing to make changes. I've implemented this since I'm doing some integration work and figured it'd be nice to be able to try the basic
cellranger
strategy.Edit: The failing PAGA test occurs locally on master as well, but I don't think I broke that.