-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing gene and transcript missing features from tsebra.gtf for AGAT #288
Comments
These lines are suposed to replace the bad formated ones:
Which are supposed to be dropped. I don't get why there still present... What version of AGAT are you using? |
Hi Jacques, The version of AGAT that I am using is v0.8.0 . I installed it with singularity. The error report is the following:
I would be grateful to know your suggestions. Thank you. Best regards, |
Ok I think this is an issue fixe in most recent AGAT version. Please use the lastest version. |
I used the latest version of AGAT v0.9.2 and the problem remains in exactly the same way. It looks like the tsebra.gtf input files only have the IDs for gene and transcript features in column 9, but they are missing the gene_id and transcript_id tags. Do you think this is the problem? All the other features have the proper tags in column 9. Same issue with the augustus.hint.gtf and braker.gtf files that are the default outputs of Braker. I am wondering how to fix this formatting issue, so I can use these files with AGAT. Thank you. Rom |
Good catch, indeed AGAT have trouble to read tsebra file. For myself:
But not with
or
|
For now, what you can do to quickly fix such tsebra output file for AGAT first run this awk command:
|
Hi Jacques, That's great! I used the awk command you provided and fixed the problem! Now the output gff3 file from AGAT looks clean. After running the report file still throws the warning below for every gene, but the output file looks clean to me and it looks like AGAT handled it well.
I also tested the fix with agat_sp_statistics.pl and works very well. Before the fix agat_sp_statistics.pl was producing double counts. By the way, I am changing the title of this issue to reflect more accurately what it's being fix here. Thanks a lot for your help! Best regards, |
I made some modifications in AGAT version 1. Tsebra files should be handle correctly in that version. |
* Use AppEaser module from https://github.com/polettix/App-Easer to create a multi layer help * Add a "agat" script => can be used to modify/expose config; to expose levels.yaml; to list the tools; get agat version, etc. * design a configuration file (config.yaml) to apply config on all scripts * add config module (with function to check configuration file e.g.) * Merge feature_levels json files into one single yaml file * Make gtf output possible to every sp script via config.yaml and create a dedicated module (OmniscientToGTF) based on code from the gff2gtf script. * fix the script compare_annotations by rewriting it from scratch. + add tests * Create a BioperlGFF module based on the Bioperl code to correct parse GFF/GTF files when they contain a mix of GFF1 and GFF2/GFF3 like seen in Augustus and Tsebra output files (fix #288). * Modify name of the Module Omniscient by AGAT
Dear author @Juke34,
I have used agat_sp_flag_short_introns.pl on my .gtf file, and I observed that the output .gff3 file generated by the script by default has additional gene and mRNA lines not present in the original gtf file. I am wondering if this is normal and adding mRNA info is expected or maybe not? I would appreciate your comments. Thank you.
The command I used:
The input gtf file:
The resulting output gff3 file:
Lines present in the output gff3 but missing in original gtf:
Best,
Rom
The text was updated successfully, but these errors were encountered: