Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema: Conditional input file validation #2453

Open
Tracked by #2429
LouisLeNezet opened this issue Sep 28, 2023 · 2 comments
Open
Tracked by #2429

Schema: Conditional input file validation #2453

LouisLeNezet opened this issue Sep 28, 2023 · 2 comments
Labels

Comments

@LouisLeNezet
Copy link
Contributor

Description of feature

Sometimes the validation schema for an input file might depend on a parameter.
An example would be a pipeline requesting VCFs and chromosome region when using params.step = "panel_prep" and BAM file with a fasta reference when params.step="map".

The aim would be to have a conditional validation of the columns present in the input file depending of a selected parameters or a specific column in the input file.

What could be done include:

  • Not defining the schema of the input file in the nextflow.schema and conditionnally check the input with a given schema in the pipeline Add possibility for direct schema nextflow-io/nf-validation#94
    • Pros: Easy to do, direct link with params.step
    • Cons: No easy readability for user, needs to look into pipeline which shema used when
if (params.step == "panel_prep") {
    ch_input = Channel.fromSamplesheet("input", schema : "assets/schema_input_panel_prep.json")
} else if (params.step == "map") {
    ch_input = Channel.fromSamplesheet("input", schema : "assets/schema_input_map.json")
}
  • Add if else statement in the input schema json and everything in the same file
    • Pros: All in one place
    • Cons: Can become really huge if multiple step. No link to params.step
{
    "$schema": "http://json-schema.org/draft-07/schema",
    "$id": "https://raw.githubusercontent.com/nf-core/phaseimpute/master/assets/schema_input.json",
    "title": "nf-core/phaseimpute pipeline - params.input",
    "description": "Schema for the file provided with params.input",
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "step": {
                "type": "string",
                "pattern": "^(panel_prep|map)$"
            }
        },
        "allOf": [
            {
                "if": {
                    "properties": {
                        "step": { "const": "panel_prep" }
                    }
                },
                "then": {
                    "vcf": {
                        "type": "string",
                        "pattern": "^\\S+\\.vcf$"
                    },
                    "region": {
                        "type": "string",
                        "pattern": "^(chr)\\d+:\\d+-\\d+$"
                    }
                }
            },
            {
                "if": {
                    "properties": {
                        "step": { "const": "map" }
                    }
                },
                "then": {
                     "bam": {
                        "type": "string",
                        "pattern": "^\\S+\\.bam$"
                    },
                    "fasta": {
                        "type": "string",
                        "pattern": "^\\S+\\.fa$"
                    }
                }
            }
        ],
        "required": ["step"]
    }
}
  • Add if else statement in the input schema json but with schema link to other schema json file
    • Pros: All easily available and readable, smaller size.
    • Cons: May not be easy to implement. No link to params.step.
{
    "$schema": "http://json-schema.org/draft-07/schema",
    "$id": "https://raw.githubusercontent.com/nf-core/phaseimpute/master/assets/schema_input.json",
    "title": "nf-core/phaseimpute pipeline - params.input",
    "description": "Schema for the file provided with params.input",
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "step": {
                "type": "string",
                "pattern": "^(panel_prep|map)$"
            }
        },
        "allOf": [
            {
                "if": {
                    "properties": {
                        "step": { "const": "panel_prep" }
                    }
                },
                "then": {
                    "schema": "assets/schema_input_panel_prep.json"
                }
            },
            {
                "if": {
                    "properties": {
                        "step": { "const": "map" }
                    }
                },
                "then": {
                    "schema": "assets/schema_input_map.json"
                }
            }
        ],
        "required": ["step"]
    }
}
  • Finally and maybe best solution: add if-else statement in nextflow.schema with link to params.step and separate json.schema
    • Pros: Readability, small
    • Cons: Complicated to implement ?
{
    "$schema": "http://json-schema.org/draft-07/schema",
    "$id": "https://raw.githubusercontent.com/nf-core/phaseimpute/master/nextflow_schema.json",
    "title": "nf-core/phaseimpute pipeline parameters",
    "description": "MyDescription",
    "type": "object",
    "definitions": {
        "input_output_options": {
            "title": "Input/output options",
            "type": "object",
            "fa_icon": "fas fa-terminal",
            "description": "Define where the pipeline should find input data and save output data.",
            "required": ["step", "input"],
            "properties": {
                "step": {
                    "type": "string",
                    "description": "Step to run.",
                    "fa_icon": "fas fa-step-forward",
                    "enum": ["simulate", "panelprep", "impute", "validate"]
                },
                "input": {
                        "type": "string",
                        "fa_icon": "fas fa-file-csv",
                        "pattern": "^\\S+\\.(csv|tsv|yaml)$",
                        "format": "file-path",
                        "mimetype": "text/csv"
                }
            },
           "allOf": [
            {
                "if": {"definitions": { "input_output_options" { "step": { "const": "panel_prep" } } }
                }, "then": {
                     "input": {"schema": "assets/schema_input_panel_prep.json"}
                }
            },
            {
               "if": {"definitions": { "input_output_options" { "step": { "const": "map" } } }
                }, "then": {
                     "input": {"schema": "assets/schema_input_panel_map.json"}
                }
            }
        ]
    }
}
@ewels
Copy link
Member

ewels commented Sep 28, 2023

I think this is a duplicate of #2428

But you've done quite a bit of work here with the extensive writeup, so let's leave both in place 😄

@ewels ewels changed the title Conditional input file validation schema Schema: Conditional input file validation Sep 28, 2023
@ewels ewels removed the enhancement label Oct 5, 2023
@ewels
Copy link
Member

ewels commented Nov 2, 2023

What I'd like in any solution here is a decoupling between configuration resolution and sample sheet schema requirements.

So for example, if creating a data set in Seqera Platform, I'd like to be able to select only:

  • Pipeline
  • Parameter (eg. --input)

As soon as the samplesheet schema depends on a secondary parameter, you effectively have to fill in an entire launch template and complete the config / parameter resolution before you know how to apply the samplesheet schema. That adds a tonne of complexity into any UI and makes it a lot less feasible.

This is the reason that I prefer simply having multiple different parameters for the different schemas. Having a dropdown and selecting the sample sheet type from input1 / input2 etc is quite simple in terms of a UI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

No branches or pull requests

2 participants