output need to be reformated #15

alexyfyf · 2023-12-04T10:38:20Z

Hi team,

I found your isonform output fasta file is not a standard format with > line as header. And there are lots of empty files in the isonform fodler such as

(base) [yan.a@vc7-shared isoforms]$ ll cluster26150*
-rw-r--r-- 1 yan.a allstaff 0 Dec  2 07:59 cluster26150_mapping_low_abundance.txt
-rw-r--r-- 1 yan.a allstaff 0 Dec  2 07:59 cluster26150_mapping.txt
-rw-r--r-- 1 yan.a allstaff 0 Dec  2 07:59 cluster26150_merged.fa
-rw-r--r-- 1 yan.a allstaff 0 Dec  2 07:59 cluster26150_merged_low_abundance.fa

Also, can you explain what the numbers in the header line means, for example this one

@0_105_891
ACUUCGACCAAGAAGAGAUACGGUGCUCUCGCCGGUAACGUCGGUGACGAAGGUGGUGUUGCUCCAAACAUUCAAACCGCUGAAGAAGCUUUGGACUUGAUUGUUGACGCUAUCAAGGCUGCUGGUCACGACGGUAAGGUCAAGAUCGGUUUGGACUGUGCUUCCUCUGAAUUUUCAAGGACGGUAAGUACGACUUGGACUUCAAGAACCCAGAAUCUGACAAAUCCAAGUGGUUGACUGGUGUCGAAUUGGCUGACAUGUACCACUCCUUGAUGAAGAGAUACCCAAUUGUCUCCAUCGAAGAUCCAUUUGCUGAAGAUGACUGGGAAGCUUGGUCUUCACUUCAAGACCGCUGGUAUCCAAAUUGUUGCUGAUGAUUUGACUGUCACCAACCCAGCUAGAAUUGCUACCGCCAUCGAAAAGAAGGCUGCUGACGCUUUGUUGUUGAAGGUUAACCAAAUCGGUACCUUGUCUGAAUCCAUCAAGGCUGCUCAAGACUUUCCUGCCAACUGGGUGUCAUGGUUUCCCACAGAUCUGGUGAAACUGAAGACACUUCAUUGCUGACUUGGUUGUCGGUUUGAGAACUGGUCAAAUCAAGACUGGUGCUCCAGCUAGAUCCGAAAGAUUGGCUAAGUUGAACCAAUUGUUGAGAAUCGAAGAAGAAUUGGGUGACAAGGCUGUCUACGCCGGUGAAAACUUCCACCACGGUGACAAGUUGUAUCGUCGUGAGUAGUGAACCGUAAGCAAAAAAAUUCCCUCAACCAUCUUAUAUCCAUUCAACCUACCAUUCCUCAAUCA

Thank you so much.

Alex

The text was updated successfully, but these errors were encountered:

aljpetri · 2023-12-04T11:17:46Z

Hi thank you for reporting this error.
I have pushed a new release now that should fix the fasta format output.
The idea behind the header for each isoform is as follows: The first number in your case '0' denotes which cluster the isoform was generated from. The second number (in your case '105') gives the batch number in the cluster (we divide each cluster in batches of 1000 reads each), while the third number contains an individual id so we do not get any double isoforms for the same id.
I will address the problem with the empty intermediate files in the next days.
Best,
Alex

alexyfyf · 2023-12-04T23:40:16Z

Hi Alex, Thank you for your reply. So my understanding is that your transcript identifications are derived from gene clusters from isonclust, so the cluster id, ie the first number, could be used as gene id surrogates? Am I correct? Thank you. Alex

…

---- Replied Message ---- | From | Alexander J ***@***.***> | | Date | 12/04/2023 21:17 | | To | aljpetri/isONform ***@***.***> | | Cc | Feng ***@***.***>, Author ***@***.***> | | Subject | Re: [aljpetri/isONform] output need to be reformated (Issue #15) | Hi thank you for reporting this error. I have pushed a new release now that should fix the fasta format output. The idea behind the header for each isoform is as follows: The first number in your case '0' denotes which cluster the isoform was generated from. The second number (in your case '105') gives the batch number in the cluster (we divide each cluster in batches of 1000 reads each), while the third number contains an individual id so we do not get any double isoforms for the same id. I will address the problem with the empty intermediate files in the next days. Best, Alex — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

aljpetri · 2023-12-05T11:42:03Z

Hi Alex,
the clusters generated by isONclust represent gene families and not genes themselves and therefore it would be dangerous using them as gene surrogates.
Best,
Alex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

output need to be reformated #15

output need to be reformated #15

alexyfyf commented Dec 4, 2023

aljpetri commented Dec 4, 2023

alexyfyf commented Dec 4, 2023 via email

aljpetri commented Dec 5, 2023

output need to be reformated #15

output need to be reformated #15

Comments

alexyfyf commented Dec 4, 2023

aljpetri commented Dec 4, 2023

alexyfyf commented Dec 4, 2023 via email

aljpetri commented Dec 5, 2023