-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch integration - Create separate solution file #218
Batch integration - Create separate solution file #218
Conversation
…set_in_process_dataset
…set_in_process_dataset
…ate_separate_solution_file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mumichae I added some comments for your review
…ate_separate_solution_file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. Just a general comment on how we use the solution adata. It's probably a question of preference, but for metrics that don't compare integrated and unintegrated objects, is it necessary to transfer the integrated representations to the solution object? I would treat the solution more as the unintegrated object from which I can pull additional metadata if needed (which for metrics we have here is not really the case, but there are other metrics (existing and potentially future metrics) that would need that information. Personally, I find it more confusing to transfer the integrated representations to the solution object instead of doing it the other way arround (selectively extracting metadata from the solution for the integrated object)
description: The organism of the sample in the dataset. | ||
required: false | ||
- type: object | ||
name: knn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same information as in .uns['neighbors']
that scanpy.neighbors
generates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yepyep! Should I rename it to knn_neighbors
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'neighbors' is the default and what most methods expect out of the box, so I'd go with that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the obsp is called knn_distances
and knn_connectivities
, so then we should rename that to distances
and connectivities
as well. WDYT?
input_solution = ad.read_h5ad(par['input_solution']) | ||
input_integrated = ad.read_h5ad(par['input_integrated']) | ||
|
||
input_solution.obsp["connectivities"] = input_integrated.obsp["connectivities"] | ||
input_solution.obsp["distances"] = input_integrated.obsp["distances"] | ||
|
||
# TODO: if we don't copy neighbors over, the metric doesn't work | ||
input_solution.uns["neighbors"] = input_integrated.uns["neighbors"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it necessary to transfer the kNN graph to the solution? Wouldn't be sufficient to run on the integrated input only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to compare the integrated object's KNN graph to the solution's label, no? Because if we only compute the ARI and NMI on the data that is in the integrated object, then we don't know for certain that this anndata still has the same dimensionality as the solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add the metadata to the integrated object instead. That way we can check that the dimensions match, but are essentially evaluating the integrated object. I think it's a matter of taste and would find it more intuitive to base things around the integrated object
input_solution = ad.read_h5ad(par['input_solution']) | ||
input_integrated = ad.read_h5ad(par['input_integrated']) | ||
|
||
input_solution.obsp["connectivities"] = input_integrated.obsp["connectivities"] | ||
input_solution.obsp["distances"] = input_integrated.obsp["distances"] | ||
|
||
# TODO: if we don't copy neighbors over, the metric doesn't work | ||
input_solution.uns["neighbors"] = input_integrated.uns["neighbors"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it necessary to transfer the kNN graph to the solution? Wouldn't be sufficient to run on the integrated input only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather compare the integrated graph to the solution's label, to make sure that the integrated object has the same dimensionality.
Of course, we could also just put an assertion here.
I'm transferring the integrated representations to the solution object to ensure that the method output didn't make any changes to the labels and such. Alternatively we could also do a lot of assertions instead of copying the data over, although it seems easier to me to just copy the method output over to the solution object instead.
👍
I guess it's more or less the same thing, as long as we're sure that there were no changes to the method output which could affect the metric scores in some way. |
…ate_separate_solution_file
Describe your changes
Issue ticket number and link
Closes #xxxx (Replace xxxx with the GitHub issue number)
Checklist before requesting a review
I have performed a self-review of my code
Check the correct box. Does this PR contain:
Proposed changes are described in the CHANGELOG.md
CI Tests succeed and look good!