Skip to content

Commit

Permalink
Update/integrate last v1 changes (#214)
Browse files Browse the repository at this point in the history
* update migration scripts

* update batch_int

* update bat_int clus_overlap

* update denoising

* add cp10k norm

* update schema

* update label_projection

* [WIP] spectral_features new control_method

* add diffusion map method

* update dim_red spectral_feature and diffu_map

* update dim_red

* generalise cp normalization

* CPM -> CP10k

* fix failing test

* fix typo

* update and rename rmse to distance correlation

* set CP normalization from cpm to cp10k

* fix typo cpm to cp

* updated changelog

---------

Co-authored-by: Robrecht Cannoodt <[email protected]>
  • Loading branch information
KaiWaldrant and rcannood authored Aug 25, 2023
1 parent 19ee4d8 commit 12f54cf
Show file tree
Hide file tree
Showing 86 changed files with 455 additions and 322 deletions.
8 changes: 7 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@

## general

### NEW FUNCTIONALITY

* Updated all current tasks in v2 to latest changes in OP v1 (PR #214)

### MAJOR CHANGES

* Relocate task directories to new `src/tasks/` location (PR #142).
Expand All @@ -11,6 +15,8 @@
and `ghcr.io/openproblems-bio/base-r` (PR #168).

* Update batch integration docker images to OpenProblems base images (PR #171).

* Changed default normalization CPM to CP10k (PR #214)

### MINOR CHANGES

Expand Down Expand Up @@ -274,7 +280,7 @@

* `methods/neuralee`: Migrated from v1.

* `metrics/rmse`: Migrated from v1, but will likely be removed.
* `metrics/distance_correlation`: Migrated from v1, but will likely be removed.

* `metrics/trustworthiness`: Migrated from v1, but will likely be removed.

Expand Down
2 changes: 1 addition & 1 deletion src/common/comp_tests/check_method_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ def search_ref_bib(reference):
assert arg_id in arg_names, f"Argument '{arg_id}' in `.functionality.info.variants['{paramset_id}']` is not an argument in `.functionality.arguments`."

assert "preferred_normalization" in info, "preferred_normalization not an info field"
norm_methods = ["log_cpm", "counts", "log_scran_pooling", "sqrt_cpm", "l1_sqrt"]
norm_methods = ["log_cpm", "log_cp10k", "counts", "log_scran_pooling", "sqrt_cpm", "sqrt_cp10k", "l1_sqrt"]
assert info["preferred_normalization"] in norm_methods, "info['preferred_normalization'] not one of '" + "', '".join(norm_methods) + "'."


Expand Down
2 changes: 1 addition & 1 deletion src/common/create_component/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ def generate_info(par, component_type, pretty_name) -> str:
| description: |
| FILL IN: A (multi-line) description of how this method works.
| # Which normalisation method this component prefers to use (required).
| preferred_normalization: log_cpm
| preferred_normalization: log_cp10k
|''')
if component_type == "method":
str += strip_margin(f'''\
Expand Down
2 changes: 1 addition & 1 deletion src/common/schemas/defs_common.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ definitions:
required: [ type ]
additionalProperties: false
PreferredNormalization:
enum: [l1_sqrt, log_cpm, log_scran_pooling, sqrt_cpm, counts]
enum: [l1_sqrt, log_cpm, log_cp10k, log_scran_pooling, sqrt_cpm, sqrt_cp10k, counts]
description: |
Which normalization method a component prefers.
Expand Down
22 changes: 22 additions & 0 deletions src/datasets/normalization/log_cp/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
__merge__: ../../api/comp_normalization.yaml
functionality:
name: "log_cp"
description: "Normalize data using Log CP"
resources:
- type: python_script
path: script.py
arguments:
- name: "--n_cp"
type: integer
default: 1e4
description: "Number of counts per cell"
- name: "--norm_id"
type: string
default: log_cp10k
description: "normalization ID to use e.g. 1e6 -> log_cpm, 1e4 -> log_cp10k"
platforms:
- type: docker
image: ghcr.io/openproblems-bio/base_python:1.0.1
- type: nextflow
directives:
label: [ lowmem, lowcpu ]
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,12 @@
par = {
'input': "resources_test/common/pancreas/dataset.h5ad",
'output': "output.h5ad",
'layer_output': "log_cpm",
'obs_size_factors': "log_cpm_size_factors"
'layer_output': "log_cp10k",
'obs_size_factors': "log_cp10k_size_factors",
'n_cp': 1e6,
}
meta = {
"functionality_name": "normalize_log_cpm"
"functionality_name": "normalize_log_cp10k"
}
## VIASH END

Expand All @@ -18,7 +19,7 @@
print(">> Normalize data", flush=True)
norm = sc.pp.normalize_total(
adata,
target_sum=1e6,
target_sum=par["n_cp"],
layer="counts",
inplace=False
)
Expand All @@ -27,7 +28,7 @@
print(">> Store output in adata", flush=True)
adata.layers[par["layer_output"]] = lognorm
adata.obs[par["obs_size_factors"]] = norm["norm_factor"]
adata.uns["normalization_id"] = meta["functionality_name"]
adata.uns["normalization_id"] = par["norm_id"]

print(">> Write data", flush=True)
adata.write_h5ad(par['output'], compression="gzip")
13 changes: 0 additions & 13 deletions src/datasets/normalization/log_cpm/config.vsh.yaml

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,10 +1,19 @@
__merge__: ../../api/comp_normalization.yaml
functionality:
name: "sqrt_cpm"
name: "sqrt_cp"
description: "Normalize data using Log Sqrt"
resources:
- type: python_script
path: script.py
arguments:
- name: "--n_cp"
type: integer
default: 1e4
description: "Number of counts per cell"
- name: "--norm_id"
type: string
default: sqrt_cp10k
description: "normalization id to use e.g. 1e4 -> sqrt_cp10k, 1e6 -> sqrt_cpm"
platforms:
- type: docker
image: ghcr.io/openproblems-bio/base_python:1.0.1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
'input': "resources_test/common/pancreas/dataset.h5ad",
'output': "output.h5ad",
'layer_output': "sqrt_cpm",
'obs_size_factors': "size_factors_sqrt_cpm"
'obs_size_factors': "size_factors_sqrt_cpm",
'n_cp': 1e6,
}
meta = {
"functionality_name": "normalize_sqrt_cpm"
Expand All @@ -19,16 +20,16 @@
print(">> Normalize data", flush=True)
norm = sc.pp.normalize_total(
adata,
target_sum=1e6,
target_sum=par['n_cp'],
layer="counts",
inplace=False
)
lognorm = np.sqrt(norm["X"])
lognorm = np.sqrt(norm['X'])

print(">> Store output in adata", flush=True)
adata.layers[par["layer_output"]] = lognorm
adata.obs[par["obs_size_factors"]] = norm["norm_factor"]
adata.uns["normalization_id"] = meta["functionality_name"]
adata.uns["normalization_id"] = par["norm_id"]

print(">> Write data", flush=True)
adata.write_h5ad(par['output'], compression="gzip")
2 changes: 1 addition & 1 deletion src/datasets/processors/pca/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
### VIASH START
par = {
'input': 'resources_test/common/pancreas/dataset.h5ad',
'layer_input': 'log_cpm',
'layer_input': 'log_cp10k',
'output': 'dataset.h5ad',
'obsm_embedding': 'X_pca',
'varm_loadings': 'pca_loadings',
Expand Down
4 changes: 2 additions & 2 deletions src/datasets/resource_test_scripts/multimodal.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,12 @@ viash run src/datasets/processors/subsample/config.vsh.yaml -- \


# run sqrt cpm normalisation on mod 1 file
viash run src/datasets/normalization/log_cpm/config.vsh.yaml -- \
viash run src/datasets/normalization/sqrt_cp/config.vsh.yaml -- \
--input $DATASET_DIR/raw_mod1.h5ad \
--output $DATASET_DIR/normalized_mod1.h5ad

# run log cpm normalisation on mod 2 file
viash run src/datasets/normalization/log_cpm/config.vsh.yaml -- \
viash run src/datasets/normalization/log_cp/config.vsh.yaml -- \
--input $DATASET_DIR/raw_mod2.h5ad \
--output $DATASET_DIR/normalized_mod2.h5ad

Expand Down
4 changes: 2 additions & 2 deletions src/datasets/resource_test_scripts/pancreas.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@ viash run src/datasets/processors/subsample/config.vsh.yaml -- \
--output $DATASET_DIR/raw.h5ad \
--seed 123

# run log cpm normalisation
viash run src/datasets/normalization/log_cpm/config.vsh.yaml -- \
# run log cp10k normalisation
viash run src/datasets/normalization/log_cp/config.vsh.yaml -- \
--input $DATASET_DIR/raw.h5ad \
--output $DATASET_DIR/normalized.h5ad

Expand Down
8 changes: 4 additions & 4 deletions src/datasets/workflows/process_openproblems_v1/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ targetDir = params.rootDir + "/target/nextflow"
include { openproblems_v1 } from "$targetDir/datasets/loaders/openproblems_v1/main.nf"

// normalization methods
include { log_cpm } from "$targetDir/datasets/normalization/log_cpm/main.nf"
include { log_cpm } from "$targetDir/datasets/normalization/log_cp/main.nf"
include { log_scran_pooling } from "$targetDir/datasets/normalization/log_scran_pooling/main.nf"
include { sqrt_cpm } from "$targetDir/datasets/normalization/sqrt_cpm/main.nf"
include { sqrt_cpm } from "$targetDir/datasets/normalization/sqrt_cp/main.nf"
include { l1_sqrt } from "$targetDir/datasets/normalization/l1_sqrt/main.nf"

// dataset processors
Expand All @@ -27,8 +27,8 @@ config = readConfig("$projectDir/config.vsh.yaml")
// add custom tracer to nextflow to capture exit codes, memory usage, cpu usage, etc.
traces = initialize_tracer()

// normalization_methods = [log_cpm, log_scran_pooling, sqrt_cpm, l1_sqrt
normalization_methods = [log_cpm, sqrt_cpm, l1_sqrt]
// normalization_methods = [log_cp, log_scran_pooling, sqrt_cp, l1_sqrt
normalization_methods = [log_cp, sqrt_cp, l1_sqrt]

workflow {
helpMessage(config)
Expand Down
14 changes: 14 additions & 0 deletions src/migration/check_migration.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/bin/bash

# viash run src/common/get_git_sha/config.vsh.yaml -p native -- --input /home/kai/Documents/openroblems/openproblems --output output/op_git_sha.json

TASK_IDS=`ls src/tasks`

for task_id in $TASK_IDS; do
echo ">> Processing $task_id"
viash run src/common/get_method_info/config.vsh.yaml -- --input . --task_id $task_id --output output/${task_id}_method.json
viash run src/migration/check_migration_status/config.vsh.yaml -p native -- --git_sha resources_test/input_git_sha.json --comp_info output/${task_id}_method.json --output output/${task_id}_method_status.json
viash run src/common/get_metric_info/config.vsh.yaml -- --input . --task_id $task_id --output output/${task_id}_metric.json
viash run src/migration/check_migration_status/config.vsh.yaml -p native -- --git_sha resources_test/input_git_sha.json --comp_info output/${task_id}_metric.json --output output/${task_id}_metric_status.json

done
16 changes: 12 additions & 4 deletions src/migration/check_migration_status/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@

## VIASH START
par = {
'git_sha': 'temp/openproblems-v1.json',
'comp_info': 'temp/denoising_metrics.json',
'output': 'temp/migration_status.json'
'git_sha': 'resources_test/input_git_sha.json',
'comp_info': 'output/denoising_metric.json',
'output': 'output/denoising_metric_status.json'
}
## VIASH END

Expand All @@ -16,10 +16,18 @@ def check_status(comp_item: List[Dict[str, str]], git_objects: List[Dict[str, st
git_object["sha"]."""

v1_path = comp_item.get("v1", {}).get("path")

if "metric_id" in comp_item:
v1_path = comp_item.get("v1.path")

if not v1_path:
return "v1.path missing"

v1_commit = comp_item.get("v1", {}).get("commit")

if "metric_id" in comp_item:
v1_commit = comp_item.get("v1.commit")

if not v1_commit:
return "v1.commit missing"

Expand All @@ -28,7 +36,7 @@ def check_status(comp_item: List[Dict[str, str]], git_objects: List[Dict[str, st
return "v1.path does not exist in git repo"

git_sha = git_object[0]["sha"]
if git_sha == comp_item["v1_commit"]:
if git_sha == v1_commit:
return "up to date"
else:
return f"out of date (sha: {git_sha})"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ functionality:
v1:
path: openproblems/tasks/_batch_integration/batch_integration_embed/methods/baseline.py
commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
preferred_normalization: log_cpm
preferred_normalization: log_cp10k
resources:
- type: python_script
path: script.py
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ functionality:
v1:
path: openproblems/tasks/_batch_integration/batch_integration_embed/methods/baseline.py
commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
preferred_normalization: log_cpm
preferred_normalization: log_cp10k
resources:
- type: python_script
path: script.py
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ functionality:
v1:
path: openproblems/tasks/_batch_integration/batch_integration_embed/methods/baseline.py
commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
preferred_normalization: log_cpm
preferred_normalization: log_cp10k
arguments:
- name: "--jitter"
type: double
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ functionality:
v1:
path: openproblems/tasks/_batch_integration/batch_integration_embed/methods/baseline.py
commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
preferred_normalization: log_cpm
preferred_normalization: log_cp10k
resources:
- type: python_script
path: script.py
Expand Down
6 changes: 3 additions & 3 deletions src/tasks/batch_integration/methods/bbknn/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@ functionality:
documentation_url: "https://github.com/Teichlab/bbknn#readme"
v1:
path: openproblems/tasks/_batch_integration/batch_integration_graph/methods/bbknn.py
commit: 29803b95c88b4ec5921df2eec7111fd5d1a95daf
preferred_normalization: log_cpm
commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
preferred_normalization: log_cp10k
variants:
bbknn_full_unscaled:
bbknn_full_scaled:
preferred_normalization: log_cpm_scaled
preferred_normalization: log_cp10k_scaled
resources:
- type: python_script
path: script.py
Expand Down
6 changes: 3 additions & 3 deletions src/tasks/batch_integration/methods/combat/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,12 @@ functionality:
documentation_url: "https://scanpy.readthedocs.io/en/stable/api/scanpy.pp.combat.html"
v1:
path: openproblems/tasks/_batch_integration/batch_integration_graph/methods/combat.py
commit: 29803b95c88b4ec5921df2eec7111fd5d1a95daf
preferred_normalization: log_cpm
commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32
preferred_normalization: log_cp10k
variants:
combat_full_unscaled:
combat_full_scaled:
preferred_normalization: log_cpm_scaled
preferred_normalization: log_cp10k_scaled
resources:
- type: python_script
path: script.py
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ functionality:
reference: "haghverdi2018batch"
repository_url: "https://code.bioconductor.org/browse/batchelor/"
documentation_url: "https://bioconductor.org/packages/batchelor/"
preferred_normalization: log_cpm
preferred_normalization: log_cp10k
resources:
- type: r_script
path: script.R
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ functionality:
reference: "haghverdi2018batch"
repository_url: "https://code.bioconductor.org/browse/batchelor/"
documentation_url: "https://bioconductor.org/packages/batchelor/"
preferred_normalization: log_cpm
preferred_normalization: log_cp10k
resources:
- type: r_script
path: script.R
Expand Down
4 changes: 2 additions & 2 deletions src/tasks/batch_integration/methods/mnnpy/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ functionality:
v1:
path: openproblems/tasks/_batch_integration/batch_integration_graph/methods/mnn.py
commit: 29803b95c88b4ec5921df2eec7111fd5d1a95daf
preferred_normalization: log_cpm
preferred_normalization: log_cp10k
variants:
mnn_full_unscaled:
mnn_full_scaled:
preferred_normalization: log_cpm_scaled
preferred_normalization: log_cp10k_scaled
resources:
- type: python_script
path: script.py
Expand Down
Loading

0 comments on commit 12f54cf

Please sign in to comment.