Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bag-level predictions do not correspond to unique samples? #70

Open
patricks-lab opened this issue Aug 1, 2024 · 1 comment
Open

Bag-level predictions do not correspond to unique samples? #70

patricks-lab opened this issue Aug 1, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@patricks-lab
Copy link

Report

Thanks for the great work!

I'm trying to print out bag-level predictions (i.e. for each donor). I'm following the classification with MIL tutorial (https://multimil.readthedocs.io/en/latest/notebooks/mil_classification.html) and after finishing training and calling mil.get_model_output() this is what adata looks like:

AnnData object with n_obs × n_vars = 359595 × 30 obs: "3'_or_5'", 'BMI', 'age_or_mean_of_age_range', 'age_range', 'anatomical_region_ccf_score', 'ancestry', 'assay', 'cause_of_death', 'cell_type', 'core_or_extension', 'dataset', 'development_stage', 'disease', 'donor_id', 'fresh_or_frozen', 'log10_total_counts', 'lung_condition', 'mixed_ancestry', 'sample', 'scanvi_label', 'sequencing_platform', 'sex', 'smoking_status', 'study', 'subject_type', 'suspension_type', 'tissue', 'tissue_coarse_unharmonized', 'tissue_detailed_unharmonized', 'tissue_dissociation_protocol', 'tissue_level_2', 'tissue_level_3', 'tissue_sampling_method', 'total_counts', 'ann_level_1_label_final', 'ann_level_2_label_final', 'ann_level_3_label_final', 'ann_level_4_label_final', 'ann_level_5_label_final', 'ref', '_scvi_batch', 'cell_attn', 'bags', 'predicted_disease' uns: '_scvi_uuid', '_scvi_manager_uuid', 'bag_true_disease', 'bag_full_predictions_disease' obsm: 'X_umap', '_scvi_extra_categorical_covs', 'full_predictions_disease'

I'm interested in getting a single prediction/label for each unique sample (namely for each unique value of adata.obs['sample']).

In the tutorial dataset there are 108 unique samples when I print len(np.unique(adata.obs['sample'])).

But when I looked at len(adata.uns['bag_full_predictions_disease']), there are 2816 predictions corresponding to 2816 bags. (Namely, len(np.unique(adata.obs['bags'])) which is 2816). But there should only be 108 unique samples, and hence 108 unique bags.

Is this the right way to get sample-level predictions (i.e. one prediction for each of the 108 unique samples)?

Thanks in advance!

Version information

No response

@patricks-lab patricks-lab added the bug Something isn't working label Aug 1, 2024
@alitinet
Copy link
Collaborator

Hi @patricks-lab! Thanks for you interest in our work and sorry for the late reply! Exactly, so each sample is split up into multiple bags to allow the training in mini-batches. I would take the average of all bags corresponding to a sample to get a unique sample representation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants