Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downsample total counts #474

Merged
merged 6 commits into from
Mar 21, 2019
Merged

Conversation

ivirshup
Copy link
Member

@ivirshup ivirshup commented Feb 11, 2019

Update to downsample_counts to allow downsampling total counts, similar to normalization by cellranger aggr (I'm pretty sure on this, there's a lot going on in their code). Additionally, enabled caching for the numba'd function, which cuts down on test time.

As adding this feature meant renaming target_counts to counts_per_cell, this becomes a breaking change. Since it's breaking, I've also gone ahead and set replace=False by default as mentioned before (#340).

Definitely willing to make changes. I've implemented this since I'm doing some integration work and figured it'd be nice to be able to try the basic cellranger strategy.

Edit: The failing PAGA test occurs locally on master as well, but I don't think I broke that.

This changes the API to `sc.pp.dowsample_counts` to allow downsample on both a per-cell and whole dataset level. Downsampling the whole count matrix is done with the argument `"total_counts"` while downsampling cells is done with `"counts_per_cell"`.
* Internal `_downsample*` functions take the count matrix instead of the AnnData object
* Added caching to `_downsample_array` (speeds tests up a lot)
* Doc updates
* Made it so original sparse matrix type is returned for downsampling total counts
* Eliminate zeros after downsampling
* Minor miscellaneous cleanup/ formatting
Example usage:

```python
>>> @deprecated_arg_names({"b": "c"})
    def foo(a, c=2):
        return a + c

>>> foo(a, b=1)
 __main__:1: DeprecationWarning: Keyword argument 'b' has been deprecated in favour of 'c'. 'b' will be removed in a future version.
 2
```
Made it so `sc.pp.downsample_counts` will still run if keyword argument "downsample_counts" is used, but will throw a deprecation warning.
@ivirshup
Copy link
Member Author

I've added a decorator for renaming arguments, @deprecated_arg_names. Should this have gone in AnnData.utils?

Decorator example usage:

@deprecated_arg_names({"b": "c"})
def foo(a, c=2):
    return a + c

foo(a, b=1)
# __main__:1: DeprecationWarning: Keyword argument 'b' has been deprecated in favour of 'c'. 'b' will be removed in a future version.
# 2

@flying-sheep
Copy link
Member

Hi! I’ve wanted to introduce https://github.com/flying-sheep/legacy-api-wrap for some time.

do you think that would be sufficient or should we take the deprecation of kwargs into account?

There’s also laurent-laporte-pro/deprecated#8

@falexwolf
Copy link
Member

Right, I'd also suspect that a non-scanpy-custom solution for deprecated arguments is the way to go. For now, this is fine; we can replace it if we transition there.

If @flying-sheep would go about replacing this with legaciy_api_wrap (after or before merging), this looks like a very good solution to me?

@ivirshup
Copy link
Member Author

@flying-sheep I think changing the name of a keyword argument is specific enough to warrant it's own functionality

  • This allows giving a very specific warning about what should change in the user's code
  • This can fully abstract away the functional change (checking if the old argument was used, and renaming), so the implementation doesn't have any code dealing with the deprecation

As far as I can tell, legacy_api_wrap doesn't quite cover this use case, as it deals with positional arguments. Does that sound right to you?

@falexwolf If there was a commonly used, high quality library which provided tools for handling deprecations like this I wouldn't mind switching over to that. After some light googling, I didn't find one. Ultimately, it's a pretty trivial decorator, and I don't think it's adding much complexity here.

@falexwolf
Copy link
Member

I completely get your point, @ivirshup.

@flying-sheep
Copy link
Member

OK, so now the question is: should this become part of legacy-api-wrap?

I’d rather have the API fixed once than using multiple decorators. I think It’s clearer to see what the new API is like if you don’t have to think about the order of multiple decorators being applied.

Also, I think

@renamed_args(new="old")

feels more natural than

@deprecated_arg_names({"old": "new"})

@ivirshup
Copy link
Member Author

ivirshup commented Mar 16, 2019

On legacy_api_wrap, I don't think I have enough experience maintaining stable APIs to make a call.

I'm not too worried about the api for this decorator if it's just in scanpy. Since it'd be only meant for internal use there aren't any promises about the api that should be kept. It also means the issue of multiple decorators can be dealt with when it occurs, and an example case could help guide the decision.

I think that I prefer passing a dict to using kwargs because it might make sense to give this decorator keyword arguments of its own. For example, if you can specify the version it'll be removed. If keyword arguments we used I agree new="old" would make sense to me, but with a dict I see "old" maps to "new" as more intuitive.

@flying-sheep
Copy link
Member

I agree with all you said 😄

@ivirshup
Copy link
Member Author

Great!

Is this all good to merge?

@falexwolf
Copy link
Member

@ivirshup, you mentioned it's a backwards compat breaking change; that's why I put it on the list of those changes. But I agree that the break is so little that we can also merge it now. It might take a while until the rest gets done. So, I'm happy to go ahead and merge.

@ivirshup
Copy link
Member Author

Right, there were two changes, the bigger one I've fixed with that decorator. Since the smaller breaking change is just a change of default bool, I could split that out, but I'm not sure how you want to handle that. If there was a v1.5 branch, it could go there?


Cells with fewer counts than `target_counts` are unaffected by this. This
has been implemented by M. D. Luecken.
If `counts_per_cell` in specified, each cell will downsampled. If
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a typo in: If counts_per_cell is specified?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, should definitely be is. I've gotta start proofreading my docs better.

@falexwolf
Copy link
Member

Thanks for clarifying this again, @ivirshup! We should have changed the default value already for 1.4.

I'll add a note to the release notes and it's fine... ;)

@falexwolf
Copy link
Member

PS: @ivirshup, I tried writing you an email recently. Did you get it?

@falexwolf falexwolf merged commit a9b0175 into scverse:master Mar 21, 2019
awnimo pushed a commit to dpeerlab/scanpy that referenced this pull request Dec 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants