Statistics utils

This repo contains the statistical analysis tools used to compute uncertainty estimates for Remoscope data. The calculations are used both on the Remoscope instrument as well as Supplementary Note 2 in the Remoscope preprint for analytical expressions of the calculations:

There are also additional tools, such as matrix deskewing analysis, which we explored but did not get implemented in the main project yet.

Local package installation

python3 -m pip install .

Usage

Use CountCompensator to correct parasitemia estimates and compute 95% confidence bounds according to a linear fit y = mx + b. Example instantiation (see documentation in compensator.py for input argument descriptions):

# Instantiate compensator using y = mx + b fit with corresponding error
# Fit is based on 0.90 confidence thresholded frightful-wendigo model classification vs clinical PCR
corrector = CountCompensator("frightful-wendigo-1931", 0.90)

# Skip compensation, using computation of parasitemia and error from raw data only
# Ignores model and confidence threshold arguments
corrector = CountCompensator("frightful-wendigo-1931", 0.90, skip=True)

Use CountDeskewer to correct for skew in class counts according to the confusion matrix. Example instantiation:

# Instantiate deskewer based on frightful-wendigo model confusion matrix
corrector = CountDeskewer("frightful-wendigo-1931")

Based on the input arguments, the fit values and confusion matrices are extracted from the appropriate .csv in data_files. See "Data files" for more details.

To compute parasitemia and corresponding 95% confidence bounds:

# Example YOGO output
class_counts = np.array([
    100000, # healthy
    60, # ring
    40, # troph
    20, # schizont
    10, # gametocyte
    150, # WBC
    200, # misc
])

# Compute parasitemia without correction
parasitemia = corrector.calc_parasitemia(class_counts)

# Get corrected parasitemia and 95% confidence bounds as parasites/uL
parasitemia, conf_bounds = corrector.get_res_from_counts(class_counts, units_ul_out=True)

# Get corrected parasitemia and 95% confidence bounds as percentage
parasitemia, conf_bounds = corrector.get_res_from_counts(class_counts, units_ul_out=False)

Data files

remo-stats-utils requires the data files to be organized in a particular schema for dynamic loading. Dynamic loading is used to match the data with the YOGO model being run in ulc-malaria-scope.

Let the model ID include the model name and number, separated by dashes:

Group files by model ID in the subfolder data_files/<model ID>
Name data files <model ID><suffix>, where the suffixes are defined in stats_utils/constants.py
- The suffix describes whether the data is generated from clinical vs cultured data and whether heatmap nuking was used in the data processing

For example, for the model frightful-wendigo-1981, one may have the following file structure:

data_files/
├── frightful-wendigo-1931/
│   ├── frightful-wendigo-1931-cmatrix-mean.npy
│   ├── frightful-wendigo-1931-inv-cmatrix-std.npy
│   ├── frightful-wendigo-1931-cultured-compensation-no-heatmaps.csv
│   ├── frightful-wendigo-1931-cultured-compensation-with-heatmaps.csv
│   ├── frightful-wendigo-1931-clinical-compensation-no-heatmaps.csv
│   ├── frightful-wendigo-1931-clinical-compensation-with-heatmaps.csv
├── other-model-0000/
│   ...

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
.github/workflows		.github/workflows
stats_utils		stats_utils
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Statistics utils

Local package installation

Usage

Data files

About

Releases 14

Packages

Contributors 3

Languages

License

czbiohub-sf/remo-stats-utils

Folders and files

Latest commit

History

Repository files navigation

Statistics utils

Local package installation

Usage

Data files

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 14

Packages 0

Contributors 3

Languages

Packages