Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output need to be reformated #15

Open
alexyfyf opened this issue Dec 4, 2023 · 3 comments
Open

output need to be reformated #15

alexyfyf opened this issue Dec 4, 2023 · 3 comments

Comments

@alexyfyf
Copy link

alexyfyf commented Dec 4, 2023

Hi team,

I found your isonform output fasta file is not a standard format with > line as header. And there are lots of empty files in the isonform fodler such as

(base) [yan.a@vc7-shared isoforms]$ ll cluster26150*
-rw-r--r-- 1 yan.a allstaff 0 Dec  2 07:59 cluster26150_mapping_low_abundance.txt
-rw-r--r-- 1 yan.a allstaff 0 Dec  2 07:59 cluster26150_mapping.txt
-rw-r--r-- 1 yan.a allstaff 0 Dec  2 07:59 cluster26150_merged.fa
-rw-r--r-- 1 yan.a allstaff 0 Dec  2 07:59 cluster26150_merged_low_abundance.fa

Also, can you explain what the numbers in the header line means, for example this one

@0_105_891
ACUUCGACCAAGAAGAGAUACGGUGCUCUCGCCGGUAACGUCGGUGACGAAGGUGGUGUUGCUCCAAACAUUCAAACCGCUGAAGAAGCUUUGGACUUGAUUGUUGACGCUAUCAAGGCUGCUGGUCACGACGGUAAGGUCAAGAUCGGUUUGGACUGUGCUUCCUCUGAAUUUUCAAGGACGGUAAGUACGACUUGGACUUCAAGAACCCAGAAUCUGACAAAUCCAAGUGGUUGACUGGUGUCGAAUUGGCUGACAUGUACCACUCCUUGAUGAAGAGAUACCCAAUUGUCUCCAUCGAAGAUCCAUUUGCUGAAGAUGACUGGGAAGCUUGGUCUUCACUUCAAGACCGCUGGUAUCCAAAUUGUUGCUGAUGAUUUGACUGUCACCAACCCAGCUAGAAUUGCUACCGCCAUCGAAAAGAAGGCUGCUGACGCUUUGUUGUUGAAGGUUAACCAAAUCGGUACCUUGUCUGAAUCCAUCAAGGCUGCUCAAGACUUUCCUGCCAACUGGGUGUCAUGGUUUCCCACAGAUCUGGUGAAACUGAAGACACUUCAUUGCUGACUUGGUUGUCGGUUUGAGAACUGGUCAAAUCAAGACUGGUGCUCCAGCUAGAUCCGAAAGAUUGGCUAAGUUGAACCAAUUGUUGAGAAUCGAAGAAGAAUUGGGUGACAAGGCUGUCUACGCCGGUGAAAACUUCCACCACGGUGACAAGUUGUAUCGUCGUGAGUAGUGAACCGUAAGCAAAAAAAUUCCCUCAACCAUCUUAUAUCCAUUCAACCUACCAUUCCUCAAUCA

Thank you so much.

Alex

@aljpetri
Copy link
Owner

aljpetri commented Dec 4, 2023

Hi thank you for reporting this error.
I have pushed a new release now that should fix the fasta format output.
The idea behind the header for each isoform is as follows: The first number in your case '0' denotes which cluster the isoform was generated from. The second number (in your case '105') gives the batch number in the cluster (we divide each cluster in batches of 1000 reads each), while the third number contains an individual id so we do not get any double isoforms for the same id.
I will address the problem with the empty intermediate files in the next days.
Best,
Alex

@alexyfyf
Copy link
Author

alexyfyf commented Dec 4, 2023 via email

@aljpetri
Copy link
Owner

aljpetri commented Dec 5, 2023

Hi Alex,
the clusters generated by isONclust represent gene families and not genes themselves and therefore it would be dangerous using them as gene surrogates.
Best,
Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants