Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Bins on ExpressionProfile after Finding Gene Intersection #34

Open
noamteyssier opened this issue Jun 27, 2022 · 1 comment
Open
Labels
Algorithmic Algorithmic Improvements bug Something isn't working

Comments

@noamteyssier
Copy link
Collaborator

This is to address the conversation from @artemy-bakulin original commit fe68880
Artemy brings up the point about class imbalance here: https://github.com/noamteyssier/pypage/issues/33#issuecomment-1167958248

I agree with the new order of operations, but we need to address the following bug in this commits current form.

The Bug

The current form will return a bin_array of size n_genes regardless of the size of the gene subset provided.

currently fails the following test:

N_GENES=1000
T = 100

def get_expression() -> (np.ndarray, np.ndarray):
    genes = np.array([f"g.{g}" for g in np.arange(N_GENES)])
    scores = np.random.normal(size=N_GENES)
    return genes, scores

def test_subsetting():
    for _ in np.arange(T):
        genes, expression = get_expression()

        exp = ExpressionProfile(genes, expression)
        subset = genes[np.random.random(genes.size) < 0.5]

        bin_sub = exp.get_gene_subset(subset)
        assert bin_sub.size == subset.size

Solution

Could be fixed by adjusting _build_bool_array or _build_bin_array by subsetting those with unset indices (initializing bool_array with np.full(-1) instead of np.zeros)

Leaving this open for now, and will circle back once the rest of the merge is complete

@noamteyssier noamteyssier added bug Something isn't working Algorithmic Algorithmic Improvements labels Jun 27, 2022
@noamteyssier
Copy link
Collaborator Author

added the test explicitly to test_expression.py 7149ffa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algorithmic Algorithmic Improvements bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant