Schema: Conditional input file validation #2453

LouisLeNezet · 2023-09-28T12:01:13Z

Description of feature

Sometimes the validation schema for an input file might depend on a parameter.
An example would be a pipeline requesting VCFs and chromosome region when using params.step = "panel_prep" and BAM file with a fasta reference when params.step="map".

The aim would be to have a conditional validation of the columns present in the input file depending of a selected parameters or a specific column in the input file.

What could be done include:

Not defining the schema of the input file in the nextflow.schema and conditionnally check the input with a given schema in the pipeline Add possibility for direct schema nextflow-io/nf-validation#94
- Pros: Easy to do, direct link with params.step
- Cons: No easy readability for user, needs to look into pipeline which shema used when

if (params.step == "panel_prep") {
    ch_input = Channel.fromSamplesheet("input", schema : "assets/schema_input_panel_prep.json")
} else if (params.step == "map") {
    ch_input = Channel.fromSamplesheet("input", schema : "assets/schema_input_map.json")
}

Add if else statement in the input schema json and everything in the same file
- Pros: All in one place
- Cons: Can become really huge if multiple step. No link to params.step

{
    "$schema": "http://json-schema.org/draft-07/schema",
    "$id": "https://raw.githubusercontent.com/nf-core/phaseimpute/master/assets/schema_input.json",
    "title": "nf-core/phaseimpute pipeline - params.input",
    "description": "Schema for the file provided with params.input",
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "step": {
                "type": "string",
                "pattern": "^(panel_prep|map)$"
            }
        },
        "allOf": [
            {
                "if": {
                    "properties": {
                        "step": { "const": "panel_prep" }
                    }
                },
                "then": {
                    "vcf": {
                        "type": "string",
                        "pattern": "^\\S+\\.vcf$"
                    },
                    "region": {
                        "type": "string",
                        "pattern": "^(chr)\\d+:\\d+-\\d+$"
                    }
                }
            },
            {
                "if": {
                    "properties": {
                        "step": { "const": "map" }
                    }
                },
                "then": {
                     "bam": {
                        "type": "string",
                        "pattern": "^\\S+\\.bam$"
                    },
                    "fasta": {
                        "type": "string",
                        "pattern": "^\\S+\\.fa$"
                    }
                }
            }
        ],
        "required": ["step"]
    }
}

Add if else statement in the input schema json but with schema link to other schema json file
- Pros: All easily available and readable, smaller size.
- Cons: May not be easy to implement. No link to params.step.

{
    "$schema": "http://json-schema.org/draft-07/schema",
    "$id": "https://raw.githubusercontent.com/nf-core/phaseimpute/master/assets/schema_input.json",
    "title": "nf-core/phaseimpute pipeline - params.input",
    "description": "Schema for the file provided with params.input",
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "step": {
                "type": "string",
                "pattern": "^(panel_prep|map)$"
            }
        },
        "allOf": [
            {
                "if": {
                    "properties": {
                        "step": { "const": "panel_prep" }
                    }
                },
                "then": {
                    "schema": "assets/schema_input_panel_prep.json"
                }
            },
            {
                "if": {
                    "properties": {
                        "step": { "const": "map" }
                    }
                },
                "then": {
                    "schema": "assets/schema_input_map.json"
                }
            }
        ],
        "required": ["step"]
    }
}

Finally and maybe best solution: add if-else statement in nextflow.schema with link to params.step and separate json.schema
- Pros: Readability, small
- Cons: Complicated to implement ?

{
    "$schema": "http://json-schema.org/draft-07/schema",
    "$id": "https://raw.githubusercontent.com/nf-core/phaseimpute/master/nextflow_schema.json",
    "title": "nf-core/phaseimpute pipeline parameters",
    "description": "MyDescription",
    "type": "object",
    "definitions": {
        "input_output_options": {
            "title": "Input/output options",
            "type": "object",
            "fa_icon": "fas fa-terminal",
            "description": "Define where the pipeline should find input data and save output data.",
            "required": ["step", "input"],
            "properties": {
                "step": {
                    "type": "string",
                    "description": "Step to run.",
                    "fa_icon": "fas fa-step-forward",
                    "enum": ["simulate", "panelprep", "impute", "validate"]
                },
                "input": {
                        "type": "string",
                        "fa_icon": "fas fa-file-csv",
                        "pattern": "^\\S+\\.(csv|tsv|yaml)$",
                        "format": "file-path",
                        "mimetype": "text/csv"
                }
            },
           "allOf": [
            {
                "if": {"definitions": { "input_output_options" { "step": { "const": "panel_prep" } } }
                }, "then": {
                     "input": {"schema": "assets/schema_input_panel_prep.json"}
                }
            },
            {
               "if": {"definitions": { "input_output_options" { "step": { "const": "map" } } }
                }, "then": {
                     "input": {"schema": "assets/schema_input_panel_map.json"}
                }
            }
        ]
    }
}

The text was updated successfully, but these errors were encountered:

ewels · 2023-09-28T22:04:51Z

I think this is a duplicate of #2428

But you've done quite a bit of work here with the extensive writeup, so let's leave both in place 😄

ewels · 2023-11-02T10:40:16Z

What I'd like in any solution here is a decoupling between configuration resolution and sample sheet schema requirements.

So for example, if creating a data set in Seqera Platform, I'd like to be able to select only:

Pipeline
Parameter (eg. --input)

As soon as the samplesheet schema depends on a secondary parameter, you effectively have to fill in an entire launch template and complete the config / parameter resolution before you know how to apply the samplesheet schema. That adds a tonne of complexity into any UI and makes it a lot less feasible.

This is the reason that I prefer simply having multiple different parameters for the different schemas. Having a dropdown and selecting the sample sheet type from input1 / input2 etc is quite simple in terms of a UI.

LouisLeNezet added schema enhancement labels Sep 28, 2023

LouisLeNezet added this to nf-core Hackathon March 2023 Sep 28, 2023

github-project-automation bot moved this to Todo in nf-core Hackathon March 2023 Sep 28, 2023

LouisLeNezet mentioned this issue Sep 28, 2023

[Project] Schema extensions #2429

Open

ewels changed the title ~~Conditional input file validation schema~~ Schema: Conditional input file validation Sep 28, 2023

ewels removed the enhancement label Oct 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema: Conditional input file validation #2453

Schema: Conditional input file validation #2453

LouisLeNezet commented Sep 28, 2023

ewels commented Sep 28, 2023

ewels commented Nov 2, 2023

Schema: Conditional input file validation #2453

Schema: Conditional input file validation #2453

Comments

LouisLeNezet commented Sep 28, 2023

Description of feature

ewels commented Sep 28, 2023

ewels commented Nov 2, 2023