Skip to content

Commit

Permalink
Merge pull request #1150 from nf-core/flexible_tx2gene
Browse files Browse the repository at this point in the history
Be more flexible on attribute values in GTFs
  • Loading branch information
drpatelh authored Jan 3, 2024
2 parents 221bdca + 87f603c commit b4c6c69
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 11 deletions.
17 changes: 9 additions & 8 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,15 @@ Special thanks to the following for their contributions to the release:
- [Phil Ewels](https://github.com/ewels)
- [Vlad Savelyev](https://github.com/vladsavelyev)

### Enhancements & fixes

- [PR #1135](https://github.com/nf-core/rnaseq/pull/1135) - Update [action-tower-launch](https://github.com/marketplace/actions/action-tower-launch) to v2 which supports more variable handling
- [PR #1141](https://github.com/nf-core/rnaseq/pull/1141) - Important! Template update for nf-core/tools v2.11
- [PR #1143](https://github.com/nf-core/rnaseq/pull/1143) - Move fasta check back to Groovy ([#1142](https://github.com/nf-core/rnaseq/issues/1142))
- [PR #1144](https://github.com/nf-core/rnaseq/pull/1144) - Interface to kmer size for pseudoaligners
- [PR #1149](https://github.com/nf-core/rnaseq/pull/1149) - Fix and patch version commands for Fastp, FastQC and UMI-tools modules ([#1103](https://github.com/nf-core/rnaseq/issues/1103))
- [PR #1150](https://github.com/nf-core/rnaseq/pull/1150) - Be more flexible on attribute values in GTFs ([#1132](https://github.com/nf-core/rnaseq/issues/1132))

### Parameters

| Old parameter | New parameter |
Expand All @@ -27,14 +36,6 @@ Special thanks to the following for their contributions to the release:
> **NB:** Parameter has been **added** if just the new parameter information is present.
> **NB:** Parameter has been **removed** if new parameter information isn't present.
### Enhancements & fixes

- [PR #1135](https://github.com/nf-core/rnaseq/pull/1135) - Update [action-tower-launch](https://github.com/marketplace/actions/action-tower-launch) to v2 which supports more variable handling
- [PR #1141](https://github.com/nf-core/rnaseq/pull/1141) - Important! Template update for nf-core/tools v2.11
- [PR #1149](https://github.com/nf-core/rnaseq/pull/1149) - Fix and patch version commands for Fastp, FastQC and UMI-tools modules ([#1103](https://github.com/nf-core/rnaseq/issues/1103))
- [PR #1144](https://github.com/nf-core/rnaseq/pull/1144) - Interface to kmer size for pseudoaligners
- [PR #1143](https://github.com/nf-core/rnaseq/pull/1143) - Move fasta check back to Groovy ([#1142](https://github.com/nf-core/rnaseq/issues/1142))

## [[3.13.2](https://github.com/nf-core/rnaseq/releases/tag/3.13.2)] - 2023-11-21

### Credits
Expand Down
11 changes: 8 additions & 3 deletions bin/tx2gene.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import argparse
import glob
import os
import re
from collections import Counter, defaultdict, OrderedDict
from collections.abc import Set
from typing import Dict
Expand Down Expand Up @@ -50,14 +51,18 @@ def discover_transcript_attribute(gtf_file: str, transcripts: Set[str]) -> str:
Returns:
str: The attribute name that corresponds to transcripts in the GTF file.
"""

votes = Counter()
with open(gtf_file) as inh:
# Read GTF file, skipping header lines
for line in filter(lambda x: not x.startswith("#"), inh):
cols = line.split("\t")
# Parse attribute column and update votes for each attribute found
attributes = dict(item.strip().split(" ", 1) for item in cols[8].split(";") if item.strip())
votes.update(key for key, value in attributes.items() if value.strip('"') in transcripts)

# Use regular expression to correctly split the attributes string
attributes_str = cols[8]
attributes = dict(re.findall(r'(\S+) "(.*?)(?<!\\)";', attributes_str))

votes.update(key for key, value in attributes.items())

if not votes:
# Log a warning if no matching attribute is found
Expand Down

0 comments on commit b4c6c69

Please sign in to comment.