Batch integration - Create separate solution file #218

rcannood · 2023-08-23T12:07:09Z

Describe your changes

Batch integration: Split the unintegrated file into a solution and a dataset file

Issue ticket number and link

Closes #xxxx (Replace xxxx with the GitHub issue number)

Checklist before requesting a review

I have performed a self-review of my code
Check the correct box. Does this PR contain:
- Breaking changes
- New functionality
- Major changes
- Minor changes
- Bug fixes
Proposed changes are described in the CHANGELOG.md
CI Tests succeed and look good!

…set_in_process_dataset

…ate_separate_solution_file

rcannood

@mumichae I added some comments for your review

src/tasks/batch_integration/api/file_integrated_graph.yaml

src/tasks/batch_integration/metrics/cell_cycle_conservation/script.py

src/tasks/batch_integration/metrics/hvg_overlap/script.py

src/tasks/batch_integration/metrics/isolated_label_asw/script.py

src/tasks/batch_integration/metrics/isolated_label_f1/script.py

src/tasks/batch_integration/metrics/pcr/script.py

…ate_separate_solution_file

mumichae

Looks good overall. Just a general comment on how we use the solution adata. It's probably a question of preference, but for metrics that don't compare integrated and unintegrated objects, is it necessary to transfer the integrated representations to the solution object? I would treat the solution more as the unintegrated object from which I can pull additional metadata if needed (which for metrics we have here is not really the case, but there are other metrics (existing and potentially future metrics) that would need that information. Personally, I find it more confusing to transfer the integrated representations to the solution object instead of doing it the other way arround (selectively extracting metadata from the solution for the integrated object)

mumichae · 2023-08-23T15:33:54Z

src/tasks/batch_integration/api/file_solution.yaml

+        description: The organism of the sample in the dataset.
+        required: false
+      - type: object
+        name: knn


Is this the same information as in .uns['neighbors'] that scanpy.neighbors generates?

Yepyep! Should I rename it to knn_neighbors?

'neighbors' is the default and what most methods expect out of the box, so I'd go with that

Yes, but the obsp is called knn_distances and knn_connectivities, so then we should rename that to distances and connectivities as well. WDYT?

mumichae · 2023-08-23T15:38:40Z

src/tasks/batch_integration/metrics/clustering_overlap/script.py

+input_solution = ad.read_h5ad(par['input_solution'])
+input_integrated = ad.read_h5ad(par['input_integrated'])
+
+input_solution.obsp["connectivities"] = input_integrated.obsp["connectivities"]
+input_solution.obsp["distances"] = input_integrated.obsp["distances"]
+
+# TODO: if we don't copy neighbors over, the metric doesn't work
+input_solution.uns["neighbors"] = input_integrated.uns["neighbors"]


is it necessary to transfer the kNN graph to the solution? Wouldn't be sufficient to run on the integrated input only?

We want to compare the integrated object's KNN graph to the solution's label, no? Because if we only compute the ARI and NMI on the data that is in the integrated object, then we don't know for certain that this anndata still has the same dimensionality as the solution.

I would add the metadata to the integrated object instead. That way we can check that the dimensions match, but are essentially evaluating the integrated object. I think it's a matter of taste and would find it more intuitive to base things around the integrated object

mumichae · 2023-08-23T15:39:15Z

src/tasks/batch_integration/metrics/graph_connectivity/script.py

+input_solution = ad.read_h5ad(par['input_solution'])
+input_integrated = ad.read_h5ad(par['input_integrated'])
+
+input_solution.obsp["connectivities"] = input_integrated.obsp["connectivities"]
+input_solution.obsp["distances"] = input_integrated.obsp["distances"]
+
+# TODO: if we don't copy neighbors over, the metric doesn't work
+input_solution.uns["neighbors"] = input_integrated.uns["neighbors"]


is it necessary to transfer the kNN graph to the solution? Wouldn't be sufficient to run on the integrated input only?

I'd rather compare the integrated graph to the solution's label, to make sure that the integrated object has the same dimensionality.

Of course, we could also just put an assertion here.

src/tasks/batch_integration/metrics/isolated_label_f1/script.py

rcannood · 2023-08-24T06:42:14Z

Looks good overall. Just a general comment on how we use the solution adata. It's probably a question of preference, but for metrics that don't compare integrated and unintegrated objects, is it necessary to transfer the integrated representations to the solution object?

I'm transferring the integrated representations to the solution object to ensure that the method output didn't make any changes to the labels and such. Alternatively we could also do a lot of assertions instead of copying the data over, although it seems easier to me to just copy the method output over to the solution object instead.

I would treat the solution more as the unintegrated object from which I can pull additional metadata if needed (which for metrics we have here is not really the case, but there are other metrics (existing and potentially future metrics) that would need that information.

👍

Personally, I find it more confusing to transfer the integrated representations to the solution object instead of doing it the other way arround (selectively extracting metadata from the solution for the integrated object)

I guess it's more or less the same thing, as long as we're sure that there were no changes to the method output which could affect the metric scores in some way.

…ate_separate_solution_file

rcannood added 6 commits August 16, 2023 14:01

Remove --hvg from methods

673dcca

commit scanvi changes

1263fe3

Merge remote-tracking branch 'origin/main' into batch_integration/sub…

077a269

…set_in_process_dataset

remove hvg related code

22c042c

Merge remote-tracking branch 'origin/main' into batch_integration/sub…

50e52d8

…set_in_process_dataset

wip commit

266e9d7

rcannood changed the title ~~wip commit~~ Batch integration - Create separate solution file Aug 23, 2023

rcannood added 2 commits August 23, 2023 14:08

Merge remote-tracking branch 'origin/main' into batch_integration/cre…

a7037e6

…ate_separate_solution_file

refactor the metrics

0652cb9

rcannood commented Aug 23, 2023

View reviewed changes

always flush

23ba138

rcannood requested a review from mumichae August 23, 2023 12:59

rcannood added 3 commits August 23, 2023 16:07

Merge remote-tracking branch 'origin/main' into batch_integration/cre…

2b92bc8

…ate_separate_solution_file

don't copy X when not needed

52eaf68

update readme

78eaf94

mumichae reviewed Aug 23, 2023

View reviewed changes

Merge remote-tracking branch 'origin/main' into batch_integration/cre…

e105ade

…ate_separate_solution_file

rcannood merged commit 0ec105d into main Aug 30, 2023
5 checks passed

rcannood deleted the batch_integration/create_separate_solution_file branch August 30, 2023 12:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch integration - Create separate solution file #218

Batch integration - Create separate solution file #218

rcannood commented Aug 23, 2023 •

edited

Loading

rcannood left a comment

mumichae left a comment

mumichae Aug 23, 2023

rcannood Aug 24, 2023

mumichae Aug 30, 2023

rcannood Aug 30, 2023

mumichae Aug 23, 2023

rcannood Aug 24, 2023

mumichae Aug 30, 2023

mumichae Aug 23, 2023

rcannood Aug 24, 2023

rcannood commented Aug 24, 2023 •

edited

Loading

Batch integration - Create separate solution file #218

Batch integration - Create separate solution file #218

Conversation

rcannood commented Aug 23, 2023 • edited Loading

Describe your changes

Issue ticket number and link

Checklist before requesting a review

rcannood left a comment

Choose a reason for hiding this comment

mumichae left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rcannood commented Aug 24, 2023 • edited Loading

rcannood commented Aug 23, 2023 •

edited

Loading

rcannood commented Aug 24, 2023 •

edited

Loading