Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added wrapper for Paraphase #3071

Open
wants to merge 42 commits into
base: master
Choose a base branch
from

Conversation

MagdalenaZZ
Copy link

@MagdalenaZZ MagdalenaZZ commented Jul 15, 2024

Paraphase by PacBio, takes HiFi aligned BAMs as input and phases haplotypes for genes of the same family, determines copy numbers and makes phased variant calls.

QC

  • I confirm that:

For all wrappers added by this PR,

  • there is a test case which covers any introduced changes,
  • input: and output: file paths in the resulting rule can be changed arbitrarily,
  • either the wrapper can only use a single core, or the example rule contains a threads: x statement with x being a reasonable default,
  • rule names in the test case are in snake_case and somehow tell what the rule is about or match the tools purpose or name (e.g., map_reads for a step that maps reads),
  • all environment.yaml specifications follow the respective best practices,
  • the environment.yaml pinning has been updated by running snakedeploy pin-conda-envs environment.yaml on a linux machine,
  • wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:),
  • all fields of the example rules in the Snakefiles and their entries are explained via comments (input:/output:/params: etc.),
  • stderr and/or stdout are logged correctly (log:), depending on the wrapped tool,
  • temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to (see here; this also means that using any Python tempfile default behavior works),
  • the meta.yaml contains a link to the documentation of the respective tool or command,
  • Snakefiles pass the linting (snakemake --lint),
  • Snakefiles are formatted with snakefmt,
  • Python wrapper scripts are formatted with black.
  • Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays).

Summary by CodeRabbit

  • New Features

    • Introduced a new YAML configuration for package management in the bio/paraphase project.
    • Added a configuration file for the paraphase tool to process genomic data and manage input/output requirements.
    • Implemented a Snakemake workflow for improved genomic data processing.
    • Added a new genomic data file to enhance dataset compatibility and usability.
    • Created a wrapper script to automate genomic data processing and output management.
  • Bug Fixes

    • Improved logging and error handling in the processing script to enhance stability.

…d BAMs as input and phases haplotypes for genes of the same family, determines copy numbers and makes phased variant calls.
@MagdalenaZZ
Copy link
Author

Hello snakemake-wrapper maintainers. Thank you for the great job you are doing! This wrapper is for the Paraphase software https://github.com/PacificBiosciences/paraphase. As test set I picked the smallest possible slice of a chromosome which contained two gene copies close to the chromosome start. After discussing with Paraphase developers this seemed the most efficient way to make a small data slice, without including a whole reference genome. The software only works for specific versions of the human genome. This is the first wrapper I've created, so I'm happy to accept and implement suggestions for improvements.

@johanneskoester
Copy link
Contributor

thank you so much for the contribution!

bio/paraphase/meta.yaml Outdated Show resolved Hide resolved
bio/paraphase/meta.yaml Outdated Show resolved Hide resolved
bio/paraphase/meta.yaml Outdated Show resolved Hide resolved
@MagdalenaZZ MagdalenaZZ changed the title feat: added wrapper for Paraphrase by PacBio feat: added wrapper for Paraphase Jul 22, 2024
@MagdalenaZZ
Copy link
Author

Thank you @johanneskoester for the review and suggestions, all incorporated.

@MagdalenaZZ
Copy link
Author

I think I've made all the changes requested?

bio/paraphase/meta.yaml Outdated Show resolved Hide resolved
@fgvieira
Copy link
Collaborator

fgvieira commented Jul 29, 2024

Thanks for your contribution. Some questions:

  • Is the file used_wrappers.yaml needed?
  • why do you need to re-header the VCF file?

bio/paraphase/wrapper.py Outdated Show resolved Hide resolved
bio/paraphase/wrapper.py Outdated Show resolved Hide resolved
@fgvieira
Copy link
Collaborator

fgvieira commented Aug 6, 2024

And remember to add the tests to tests.py.

MagdalenaZZ and others added 2 commits August 6, 2024 11:46
Removed exception
@MagdalenaZZ
Copy link
Author

I now pushed a change that will resolve the Code quality / formatting (pull_request) linting error.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE

Commits

Files that changed from the base of the PR and between b63cf7f and c5ed609.

Files selected for processing (1)
  • bio/paraphase/wrapper.py (1 hunks)
Additional context used
Path-based instructions (1)
bio/paraphase/wrapper.py (1)

Pattern **/*.py: Do not suggest to add trailing commas for improving formatting.
Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not explicitly return anything.

Ruff
bio/paraphase/wrapper.py

1-1: os imported but unused

Remove unused import: os

(F401)


6-6: shutil imported but unused

Remove unused import: shutil

(F401)


8-8: Undefined name snakemake

(F821)


9-9: Undefined name snakemake

(F821)


17-17: Undefined name snakemake

(F821)


18-18: Undefined name snakemake

(F821)


20-20: Undefined name snakemake

(F821)


28-28: Trailing comma missing

Add trailing comma

(COM812)


34-34: Undefined name snakemake

(F821)


35-35: Undefined name snakemake

(F821)


38-38: Unnecessary open mode parameters

Remove open mode parameters

(UP015)


64-64: Undefined name snakemake

(F821)


65-65: Undefined name snakemake

(F821)


65-65: Trailing comma missing

Add trailing comma

(COM812)


68-68: Undefined name snakemake

(F821)


68-68: Trailing comma missing

Add trailing comma

(COM812)


77-77: Undefined name snakemake

(F821)


78-78: Undefined name snakemake

(F821)


79-79: Trailing comma missing

Add trailing comma

(COM812)


83-83: Trailing comma missing

Add trailing comma

(COM812)


85-85: Undefined name snakemake

(F821)


87-87: Do not catch blind exception: Exception

(BLE001)

Additional comments not posted (4)
bio/paraphase/wrapper.py (4)

1-6: Remove unused imports.

The imports os and shutil are not used in the code and should be removed to clean up the file.

Tools
Ruff

1-1: os imported but unused

Remove unused import: os

(F401)


6-6: shutil imported but unused

Remove unused import: shutil

(F401)


11-24: LGTM! Use of TemporaryDirectory and shell execution is appropriate.

The use of TemporaryDirectory ensures that temporary files are managed correctly, and the shell command is constructed in a clear manner.

Tools
Ruff

17-17: Undefined name snakemake

(F821)


18-18: Undefined name snakemake

(F821)


20-20: Undefined name snakemake

(F821)


51-80: Comprehensive VCF and BAM file processing.

The logic for compressing, indexing, and processing VCF files, as well as copying BAM files, is well-structured. Ensure robust error handling for shell commands.

Verification successful

Robust Error Handling for Shell Commands in bio/paraphase/wrapper.py.

The shell commands in bio/paraphase/wrapper.py are wrapped in a try-except block, providing necessary error handling by catching exceptions and printing error messages. This ensures robustness in the script's execution. No further action is required.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that shell commands handle errors correctly.

# Test: Search for error handling in shell command execution.
rg --type python 'shell\(' -A 3

Length of output: 64


Script:

#!/bin/bash
# Description: Verify that shell commands handle errors correctly.

# Corrected Test: Search for error handling in shell command execution in Python files.
rg 'shell\(' -A 3 --glob '*.py'

Length of output: 206621

Tools
Ruff

64-64: Undefined name snakemake

(F821)


65-65: Undefined name snakemake

(F821)


65-65: Trailing comma missing

Add trailing comma

(COM812)


68-68: Undefined name snakemake

(F821)


68-68: Trailing comma missing

Add trailing comma

(COM812)


77-77: Undefined name snakemake

(F821)


78-78: Undefined name snakemake

(F821)


79-79: Trailing comma missing

Add trailing comma

(COM812)


8-8: Ensure snakemake is defined.

The snakemake object is used but not defined in this file. Ensure it is passed to the script or imported correctly.

Verification successful

snakemake is correctly imported in bio/paraphase/wrapper.py.

The snakemake object is imported from snakemake.shell, which is a common pattern for Snakemake wrappers. This ensures that snakemake is defined and available for use in the script.

  • The import statement from snakemake.shell import shell is present in bio/paraphase/wrapper.py.

This confirms that snakemake is properly set up for use in this file. No further action is needed regarding the import or definition of snakemake.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify if `snakemake` is defined or imported in the context where this script is used.

# Test: Search for the definition or import of `snakemake` in related files.
rg --type python 'snakemake'

Length of output: 59


Script:

#!/bin/bash
# Description: Correctly verify if `snakemake` is defined or imported in the context where this script is used.

# Test: Search for the definition or import of `snakemake` in Python files.
rg 'snakemake' --glob '*.py'

Length of output: 339444


Script:

#!/bin/bash
# Description: Check for any import or mention of `snakemake` in `bio/paraphase/wrapper.py`.

# Search for `snakemake` in `bio/paraphase/wrapper.py`.
rg 'snakemake' bio/paraphase/wrapper.py

Length of output: 895

Tools
Ruff

8-8: Undefined name snakemake

(F821)

bio/paraphase/wrapper.py Outdated Show resolved Hide resolved
bio/paraphase/wrapper.py Outdated Show resolved Hide resolved
@MagdalenaZZ
Copy link
Author

Can someone please address the change requested?

fgvieira and others added 3 commits August 20, 2024 21:19
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Outside diff range, codebase verification and nitpick comments (3)
bio/paraphase/meta.yaml (1)

13-14: Remove trailing spaces.

Trailing spaces are present on these lines. Remove them to adhere to YAML formatting standards.

Fix the trailing spaces as follows:

-  bai: bai index file for the realigned reads 
-  merged_vcf: VCF file containing all newly discovered variants, phased by gene copy and haplotype  
+  bai: bai index file for the realigned reads
+  merged_vcf: VCF file containing all newly discovered variants, phased by gene copy and haplotype
bio/paraphase/wrapper.py (2)

1-1: Remove unused imports.

The imports os and shutil are not used in the code and should be removed to clean up the file.

Remove the unused imports as follows:

- import os
- import shutil

46-46: Remove unnecessary close call.

The fai_file.close() call is unnecessary because the with statement automatically handles file closing.

Remove the unnecessary close call as follows:

- output.close()
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between c5ed609 and e41cb61.

Files selected for processing (2)
  • bio/paraphase/meta.yaml (1 hunks)
  • bio/paraphase/wrapper.py (1 hunks)
Additional context used
Path-based instructions (1)
bio/paraphase/wrapper.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Learnings (1)
bio/paraphase/wrapper.py (1)
Learnt from: fgvieira
PR: snakemake/snakemake-wrappers#3071
File: bio/paraphase/wrapper.py:84-86
Timestamp: 2024-08-20T19:23:34.259Z
Learning: In Snakemake wrappers, a try/except block is unnecessary because Snakemake will handle exceptions if the wrapper fails.
yamllint
bio/paraphase/meta.yaml

[error] 2-2: duplication of key "name" in mapping

(key-duplicates)

Ruff
bio/paraphase/wrapper.py

6-6: Undefined name snakemake

(F821)


7-7: Undefined name snakemake

(F821)


15-15: Undefined name snakemake

(F821)


16-16: Undefined name snakemake

(F821)


18-18: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


61-61: Undefined name snakemake

(F821)


62-62: Undefined name snakemake

(F821)


65-65: Undefined name snakemake

(F821)


74-74: Undefined name snakemake

(F821)


75-75: Undefined name snakemake

(F821)


82-82: Undefined name snakemake

(F821)

bio/paraphase/meta.yaml Outdated Show resolved Hide resolved
bio/paraphase/wrapper.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Outside diff range, codebase verification and nitpick comments (1)
bio/paraphase/wrapper.py (1)

48-82: Consider replacing print statements with logging.

Replacing print statements with logging would be more consistent with Snakemake's logging practices.

Use this diff to replace the print statements:

-  print(f"Compressed and indexed: {vcf}.gz")
+  log.info(f"Compressed and indexed: {vcf}.gz")

-  print(f"Merged, reheadered, and sorted VCF file created: {snakemake.output.merged_vcf}")
+  log.info(f"Merged, reheadered, and sorted VCF file created: {snakemake.output.merged_vcf}")

-  print("No output VCF or BAM files were produced by paraphase, I hope this is what you were expecting, human?")
+  log.warning("No output VCF or BAM files were produced by paraphase, I hope this is what you were expecting, human?")
Tools
Ruff

52-52: SyntaxError: unindent does not match any outer indentation level


53-53: SyntaxError: Unexpected indentation


58-58: SyntaxError: Unexpected indentation


78-78: SyntaxError: Expected a statement


78-78: SyntaxError: Expected a statement


78-79: SyntaxError: Expected a statement


79-79: SyntaxError: Unexpected indentation

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between e41cb61 and c9dc011.

Files selected for processing (1)
  • bio/paraphase/wrapper.py (1 hunks)
Additional context used
Path-based instructions (1)
bio/paraphase/wrapper.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Learnings (1)
bio/paraphase/wrapper.py (2)
Learnt from: MagdalenaZZ
PR: snakemake/snakemake-wrappers#3071
File: bio/paraphase/wrapper.py:0-0
Timestamp: 2024-08-21T08:27:20.017Z
Learning: In Snakemake wrappers, a try/except block is unnecessary because Snakemake will handle exceptions if the wrapper fails.
Learnt from: fgvieira
PR: snakemake/snakemake-wrappers#3071
File: bio/paraphase/wrapper.py:84-86
Timestamp: 2024-08-20T19:23:34.259Z
Learning: In Snakemake wrappers, a try/except block is unnecessary because Snakemake will handle exceptions if the wrapper fails.
Ruff
bio/paraphase/wrapper.py

42-42: SyntaxError: unindent does not match any outer indentation level


45-45: SyntaxError: Unexpected indentation


52-52: SyntaxError: unindent does not match any outer indentation level


53-53: SyntaxError: Unexpected indentation


58-58: SyntaxError: Unexpected indentation


78-78: SyntaxError: Expected a statement


78-78: SyntaxError: Expected a statement


78-79: SyntaxError: Expected a statement


79-79: SyntaxError: Unexpected indentation


89-89: SyntaxError: unindent does not match any outer indentation level

Additional comments not posted (3)
bio/paraphase/wrapper.py (3)

1-8: Imports and initial setup look good.

The imports and initial setup are appropriate for the functionality provided by the wrapper.


10-22: TemporaryDirectory usage and paraphase execution look good.

The use of TemporaryDirectory for managing temporary files and the construction of the paraphase command using Snakemake inputs and parameters are appropriate.


89-162: Remove duplicate code block.

The code block from line 89 onwards is a duplicate of the previous code and should be removed to avoid redundancy.

Use this diff to remove the duplicate code block:

- with TemporaryDirectory() as tmpdirname:
-    ...

Likely invalid or redundant comment.

Tools
Ruff

89-89: SyntaxError: unindent does not match any outer indentation level

bio/paraphase/wrapper.py Outdated Show resolved Hide resolved
MagdalenaZZ and others added 4 commits August 21, 2024 10:34
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@MagdalenaZZ
Copy link
Author

In the file I sorted some formatting issues, hopefully the checks will work now.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between c9dc011 and ef1ba6e.

Files selected for processing (2)
  • bio/paraphase/meta.yaml (1 hunks)
  • bio/paraphase/wrapper.py (1 hunks)
Files skipped from review due to trivial changes (1)
  • bio/paraphase/meta.yaml
Additional context used
Path-based instructions (1)
bio/paraphase/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

Learnings (1)
bio/paraphase/wrapper.py (2)
Learnt from: MagdalenaZZ
PR: snakemake/snakemake-wrappers#3071
File: bio/paraphase/wrapper.py:0-0
Timestamp: 2024-08-21T08:27:20.017Z
Learning: In Snakemake wrappers, a try/except block is unnecessary because Snakemake will handle exceptions if the wrapper fails.
Learnt from: fgvieira
PR: snakemake/snakemake-wrappers#3071
File: bio/paraphase/wrapper.py:84-86
Timestamp: 2024-08-20T19:23:34.259Z
Learning: In Snakemake wrappers, a try/except block is unnecessary because Snakemake will handle exceptions if the wrapper fails.
Ruff
bio/paraphase/wrapper.py

2-2: sys imported but unused

Remove unused import: sys

(F401)


6-6: Undefined name snakemake

(F821)


7-7: Undefined name snakemake

(F821)


15-15: Undefined name snakemake

(F821)


16-16: Undefined name snakemake

(F821)


18-18: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


58-58: Undefined name snakemake

(F821)


59-59: Undefined name snakemake

(F821)


61-61: Undefined name snakemake

(F821)


69-69: Undefined name snakemake

(F821)


70-70: Undefined name snakemake

(F821)


75-75: Undefined name snakemake

(F821)


83-83: Undefined name snakemake

(F821)


84-84: Undefined name snakemake

(F821)


86-86: Undefined name snakemake

(F821)


100-100: Undefined name snakemake

(F821)


101-101: Undefined name snakemake

(F821)


127-127: Undefined name snakemake

(F821)


128-128: Undefined name snakemake

(F821)


130-130: Undefined name snakemake

(F821)


138-138: Undefined name snakemake

(F821)


139-139: Undefined name snakemake

(F821)


144-144: Undefined name snakemake

(F821)

Additional comments not posted (4)
bio/paraphase/wrapper.py (4)

10-22: LGTM!

The code correctly uses a temporary directory and runs the paraphase command with appropriate parameters and logging.

The code changes are approved.

Tools
Ruff

15-15: Undefined name snakemake

(F821)


16-16: Undefined name snakemake

(F821)


18-18: Undefined name snakemake

(F821)


23-27: LGTM!

The touch command is correctly used to create the output file.

The code changes are approved.


29-44: LGTM!

The code correctly reads the .fai index file and writes formatted header lines to the output VCF header file.

The code changes are approved.

Tools
Ruff

32-32: Undefined name snakemake

(F821)


33-33: Undefined name snakemake

(F821)


45-75: LGTM!

The code correctly processes VCF files and copies BAM and BAI files. The print statements provide useful information about the processing steps.

The code changes are approved.

Tools
Ruff

58-58: Undefined name snakemake

(F821)


59-59: Undefined name snakemake

(F821)


61-61: Undefined name snakemake

(F821)


69-69: Undefined name snakemake

(F821)


70-70: Undefined name snakemake

(F821)


75-75: Undefined name snakemake

(F821)

bio/paraphase/wrapper.py Outdated Show resolved Hide resolved
bio/paraphase/wrapper.py Outdated Show resolved Hide resolved
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between ef1ba6e and 34bf6b8.

Files selected for processing (1)
  • bio/paraphase/wrapper.py (1 hunks)
Additional context used
Path-based instructions (1)
bio/paraphase/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

Ruff
bio/paraphase/wrapper.py

5-5: Undefined name snakemake

(F821)


6-6: Undefined name snakemake

(F821)


14-14: Undefined name snakemake

(F821)


15-15: Undefined name snakemake

(F821)


17-17: Undefined name snakemake

(F821)


31-31: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


57-57: Undefined name snakemake

(F821)


58-58: Undefined name snakemake

(F821)


60-60: Undefined name snakemake

(F821)


68-68: Undefined name snakemake

(F821)


69-69: Undefined name snakemake

(F821)


74-74: Undefined name snakemake

(F821)


82-82: Undefined name snakemake

(F821)


83-83: Undefined name snakemake

(F821)


85-85: Undefined name snakemake

(F821)


99-99: Undefined name snakemake

(F821)


100-100: Undefined name snakemake

(F821)


126-126: Undefined name snakemake

(F821)


127-127: Undefined name snakemake

(F821)


129-129: Undefined name snakemake

(F821)


137-137: Undefined name snakemake

(F821)


138-138: Undefined name snakemake

(F821)


143-143: Undefined name snakemake

(F821)

Additional comments not posted (6)
bio/paraphase/wrapper.py (6)

1-7: LGTM!

The imports and initial setup are appropriate for the functionality described.

The code changes are approved.

Tools
Ruff

5-5: Undefined name snakemake

(F821)


6-6: Undefined name snakemake

(F821)


9-21: LGTM!

The temporary directory creation and paraphase execution are correctly implemented.

The code changes are approved.

Tools
Ruff

14-14: Undefined name snakemake

(F821)


15-15: Undefined name snakemake

(F821)


17-17: Undefined name snakemake

(F821)


22-26: LGTM!

The use of the touch command is appropriate for ensuring the output file exists.

The code changes are approved.


28-43: LGTM!

The VCF header creation is correctly implemented.

The code changes are approved.

Tools
Ruff

31-31: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


44-75: LGTM!

The VCF file handling and BAM/BAI file copying are correctly implemented.

The code changes are approved.

Tools
Ruff

57-57: Undefined name snakemake

(F821)


58-58: Undefined name snakemake

(F821)


60-60: Undefined name snakemake

(F821)


68-68: Undefined name snakemake

(F821)


69-69: Undefined name snakemake

(F821)


74-74: Undefined name snakemake

(F821)


113-143: Remove duplicate code.

This segment is a duplicate of the previous code for handling VCF files and copying BAM/BAI files. Consider removing it to avoid redundancy.

Apply this diff to remove the duplicate code:

-### Concatenating, reheadering, and sorting the zipped and indexed VCF files, and copy the remapped reads
-vcf_res = glob.glob(f"{tmpdirname}/*_vcfs/*vcf")
-if vcf_res:
-    for vcf in vcf_res:
-        bgzip_cmd = f"bgzip -c {vcf} > {vcf}.gz"
-        shell(bgzip_cmd)
-        index_cmd = f"bcftools index {vcf}.gz"
-        shell(index_cmd)
-        print(f"Compressed and indexed: {vcf}.gz")
-    params_variant_files = " ".join([f"{vcf}.gz" for vcf in vcf_res])
-    shell(
-        f"bcftools concat -a -Oz {params_variant_files} | "
-        f"bcftools annotate --header-lines {snakemake.output.vcf_header} | "
-        f"bcftools sort -Oz -o {snakemake.output.merged_vcf}"
-    )
-    print(f"Merged, reheadered, and sorted VCF file created: {snakemake.output.merged_vcf}")
-    # Copy out bam and bai files
-    bam_res = glob.glob(f"{tmpdirname}/*.bam")
-    bai_res = glob.glob(f"{tmpdirname}/*.bai")
-    # print("BAM RES: ", bam_res, bai_res)
-    shell(
-        f"""
-        cp -pr {' '.join(bam_res)} {snakemake.output.bam};
-        cp -pr {' '.join(bai_res)} {snakemake.output.bai}
-        """
-    )
-else:
-    print("No output VCF or BAM files were produced by paraphase, I hope this is what you were expecting, human?")
-    shell(f"touch {snakemake.output.merged_vcf}")

Likely invalid or redundant comment.

Tools
Ruff

126-126: Undefined name snakemake

(F821)


127-127: Undefined name snakemake

(F821)


129-129: Undefined name snakemake

(F821)


137-137: Undefined name snakemake

(F821)


138-138: Undefined name snakemake

(F821)


143-143: Undefined name snakemake

(F821)

bio/paraphase/wrapper.py Show resolved Hide resolved
bio/paraphase/wrapper.py Outdated Show resolved Hide resolved
@MagdalenaZZ
Copy link
Author

Formatting issues should be resolved now

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 34bf6b8 and 8631711.

Files selected for processing (1)
  • bio/paraphase/wrapper.py (1 hunks)
Additional context used
Path-based instructions (1)
bio/paraphase/wrapper.py (2)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.


Pattern **/wrapper.py: Do not complain about use of undefined variable called snakemake.

Ruff
bio/paraphase/wrapper.py

5-5: Undefined name snakemake

(F821)


6-6: Undefined name snakemake

(F821)


14-14: Undefined name snakemake

(F821)


15-15: Undefined name snakemake

(F821)


17-17: Undefined name snakemake

(F821)


31-31: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


59-59: Undefined name snakemake

(F821)


60-60: Undefined name snakemake

(F821)


63-63: Undefined name snakemake

(F821)


72-72: Undefined name snakemake

(F821)


73-73: Undefined name snakemake

(F821)


80-80: Undefined name snakemake

(F821)


88-88: Undefined name snakemake

(F821)


89-89: Undefined name snakemake

(F821)


91-91: Undefined name snakemake

(F821)


105-105: Undefined name snakemake

(F821)


106-106: Undefined name snakemake

(F821)


134-134: Undefined name snakemake

(F821)


135-135: Undefined name snakemake

(F821)


138-138: Undefined name snakemake

(F821)


147-147: Undefined name snakemake

(F821)


148-148: Undefined name snakemake

(F821)


155-155: Undefined name snakemake

(F821)

Additional comments not posted (5)
bio/paraphase/wrapper.py (5)

1-3: LGTM!

The imports are necessary and correctly used.


5-6: LGTM!

The logging setup and parameter retrieval are appropriate.

Tools
Ruff

5-5: Undefined name snakemake

(F821)


6-6: Undefined name snakemake

(F821)


9-21: LGTM!

The temporary directory usage is appropriate for handling intermediate files.

Tools
Ruff

14-14: Undefined name snakemake

(F821)


15-15: Undefined name snakemake

(F821)


17-17: Undefined name snakemake

(F821)


11-26: Refactor to avoid code duplication.

The code for running paraphase and touching the output is duplicated. Consider refactoring to avoid redundancy.

Apply this diff to remove the duplicate code:

- with TemporaryDirectory() as tmpdirname:
-    ### Run paraphase
-    shell(
-        f"""
-        (paraphase --bam {snakemake.input.bam} \
-        --reference {snakemake.input.fasta} \
-        --out {tmpdirname} \
-        {snakemake.params.genome} \
-        {extra}) {log}
-        """,
-        tmpdirname=tmpdirname,  # Pass tmpdirname to the shell environment
-    )
-    shell(
-        """
-        touch {snakemake.output}
-        """
-    )

Likely invalid or redundant comment.

Tools
Ruff

14-14: Undefined name snakemake

(F821)


15-15: Undefined name snakemake

(F821)


17-17: Undefined name snakemake

(F821)


28-45: Refactor to avoid code duplication.

The code for creating a new VCF header is duplicated. Consider refactoring to avoid redundancy.

Apply this diff to remove the duplicate code:

- ### Create a new VCF header
- # Get the paths from Snakemake input and output objects
- input_faidx = snakemake.input.faidx
- output_vcf_header = snakemake.output.vcf_header
- # Open the .fai index file and read lines
- with open(input_faidx, "r") as fai_file:
-    lines = fai_file.readlines()
- # Open the output file and write formatted header lines
- with open(output_vcf_header, "w") as output:
-    for line in lines:
-        contig_id, length = line.split()[:2]  # Assuming the first two elements are ID and length
-        output.write(f"##contig=<ID={contig_id},length={length}>\n")

Likely invalid or redundant comment.

Tools
Ruff

31-31: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)

bio/paraphase/wrapper.py Show resolved Hide resolved
bio/paraphase/wrapper.py Show resolved Hide resolved
@fgvieira
Copy link
Collaborator

@MagdalenaZZ remember to add the test!

@MagdalenaZZ
Copy link
Author

@MagdalenaZZ remember to add the test!

Hi, it should be allright now? I'll be away for a while, so I hope this us all of the changes needed.

@fgvieira
Copy link
Collaborator

fgvieira commented Aug 30, 2024

Hi @MagdalenaZZ I do'nt think you've added the test. The tests need to be added to the tests.py file. For example, for the adapterremoval_se test:

snakemake-wrappers/test.py

Lines 909 to 922 in bb4acf9

@skip_if_not_modified
def test_adapterremoval_se():
run(
"bio/adapterremoval",
[
"snakemake",
"--cores",
"1",
"--use-conda",
"trimmed/se/a.fastq.gz",
"trimmed/se/a.discarded.fastq.gz",
"stats/se/a.settings",
],
)

And you have duplicated your code in wrapper.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants