-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathChangelog
165 lines (138 loc) · 8.45 KB
/
Changelog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
CHANGELOG v1.2
[miRProf] (TODO)
+(16/09/2016) Create empty file
If there aren't any files create an empty cons file.
[Problem for no results found with mirProf (Some species still don't have any)]
Solved these errors:
grep: .../sRNA/data/mirprof/lib01_filt-18_26_5_[gneome]_mirbase_srnas.fa: No such file or directory
cp: impossível analisar '.../data/mirprof/lib01_filt-18_26_5_[genome]_mirbase_srnas.fa': No such file or directory
+(12/06/2017) Deal with mismatches
Non redundant annotations by re running miRProf
n order to perform the miRProf search without losing reads when allowing mismatches during searches, the workflow is designed using an interactive approach. Thus the first search is preformed using 0 mismatches and the annotated sequences are removed from the initial dataset. This dataset is used to perform a subsequent interaction by running with an increased number of mismatches and this process is repeated until the number of mismatches required is reached. The annotated sequences resulting from the individual runs are then grouped. The counting of the MiR families with sRNA sequences annotated in more than one miR family, are collapsed reporting the abundance of the multiply annotated sRNA sequence only once and avoiding redundant counting.
[Install] (Done)
+(16/09/2016)Changed path adding.
Now the install script check which shell is being used and added to the proper startup file.
+(16/09/2016)Add miRBase file
Was only adding the directory to config file.
-(TODO) Allow use of vars in prompt (Check if possible and inform in text)
For bash this seams great. But what problems will this create in the long-term?
+(19/01/2017) fastqc
Check if exists and install fastqc. In fastq or fasta depending on the input. If not installed it's not ran.
+(19/01/2017) Adding java to path if not installed
If java isn't on the system the downloaded version will be added to the path.
+(12/10/2017) Store current commit hash
Grab the current commit hash if git isn't available
-(TODO) Check if R exists
Must install it if doesn't exist. Any version should do. At least for building the graphs. "Not feasible to install it. Either bring it along or "
-(TODO) Report build in LaTeX texlive-core and possibly bin have to be installed 400mb and 40mb respectively. Must se how to do this seams big to generate a pdf. But some stripping might be possible. Another possiblility would be setting up a push server to produce PDFs. But might be over the top. Besides people wouldn't like it that much.
[Check before start] (Done)
+(16/09/2016)mirbase file
Check the var in mirbase var is a file that exists (Only checking if var exists use -e or something man test)
+(20/09/2016) project dir empty
Check if project dir is empty if not prompt if u want to continue
+(17/11/2016) Check libs exist
Check whether the libs given by the numbers exist in the inserts dir. This avoids running with no libs causing a bunch of errors.
https://github.com/forestbiotech-lab/miRPursuit/projects/1#card-843979
+(12/10/2017) show first file to be processed
shows the first file that will be processed without the path. Long path just clutter the display. The directory path can be seen in the inserts dir.
-(TODO) Check for updates
Compare commit hashes and show main differences between commits
[Filter] (Done)
+(20/09/2016)If filtering leaves you with no sRNA should not continue for this library. No there's nothing to process.
Will continue because other libraries might have interesting results. Only issue a warning.
+(17/11/2016) Filter suffix chose from varous definitions
Copies params from the filter with the filter suffix in filters dir, into the wbench_filter_in_use.cfg. This config file is the one that will be used throughout the workflow.
[Genome] (DONE)
+(20/09/2016)Set up in a way that check weather there are any results from genome search
Checks weather the output of genome filtering produces an empty file. If so a warning is issued to main window announcing that, that library produced no reads and that analysis of that library is over, but program will continue execution.
+(20/09/2016)Add parameters for genome filtering.
Added patman_genome.cfg file to configs with all the different possible parameters that can be changed when filtering genome.
+(20/09/2016) Genome folder in project
Changed the FILTER-Genome folder to something more uniform. filter_genome
[Logging] (TODO)
+(20/09/2016)This is terrible so many log files.
Changing to master log file and folder with each subroutine (They are only copied to folder if run terminates with success)
+(20/09/2016)Saving logs with PPID
Logs were being copied to the log folder by OK. in the file name. Now they are grouped by PPID. (This may still cause problems because PPIDs can still collide but it's very rare that this happens. Would only haven on multiple failed projects with restarts.)
+(20/09/2016) Added parameters of run in log.
Parameters for every run are being stored in it's log file.
-(TODO) Logging for report.sh
is still lacking should be dealt with once unique report file output starts to be generated.
-(TODO) Add run parameter in logs
Added the run parameters used in the beginning of the file. [TODO for the rest of the logs]
[Parallel processing] (TODO)
-Update to dynamic parallel processing
Currently the stack must finish running all subprocesses to start a new batch.
With dyanamic parallel processing script could fork always the top number of processes as soon as on finishes.
[Phread score] (TODO)
-(TODO) Hardcoded phread score
Must set a configuration file to ajust phread score.
[Document filtering](TODO)
-Point to how you can change filtering t/rRNA lists.
https://github.com/forestbiotech-lab/miRPursuit/projects/1#card-843974
[Config] (TODO)
+(17/11/2016) Config folder defauts
Improve config files folder: Must come with defaults in folder
-(TODO) Not sure what is the issue
Improve the guide explaining that this is not intended for single files but can be used that way. (Well)
[Send bug]
-(TODO) If bug send logs
prompt and send if program doesn't finish properly
[README]
-(TODO) README to all folders
Add readme to log folder explaining structure and others if they exist.
[Reporting]
-(TODO) Report file
A general report file with a compilation of all the data generated. (Statistics and path to were results are.)
-(TODO) grep error no reads
Check which grep has problems with no reads. Looks like counts_merge.sh because of 90%. (Occurs when running dataset with trim option)
-(TODO) merge tables
Run r script that merges all tables in counts
[email notification] (TODO)
-(TODO) Send email automatically
When run is finished send email to responsible person.
[Screen printing] (TODO)
-(TODO) Ensure minimum printing to screen
mircat is printing stuff to screen in server but now here why? (Version problem?)
Not sure this is common problem
[Where to save] (TODO)
-(TODO)Project dir
Instead of changing manually the workdir. Add it as a optional arg at start.
Flag the workdir with color
[Trimming] (DONE)
+(18/11/2016) Set flag --trim to signal reads must be trimmed
This is valid for all modes. Adaptor is in workdirs.cfg
https://github.com/forestbiotech-lab/miRPursuit/projects/1#card-843951
[Headless] (DONE)
+(20/01/2017) Add dummy X session.
This allows miRPursuit to run headless. If Xvfb is installed.
[General] (TODO)
-(TODO) Skip run e create empty files
Don't run processes if input is empty. Just a wast of time. Simply check if file is empty
......
[Origin? - Solved with empty files (Test this)]
ls: impossível aceder a '.../data/*_cons.fa': No such file or directory
[Problem trying to get all cons sequences]
[Solution upstream - Create empty file. If no results found]
[Report structure MD / Latex]
After each step runs a script will we called to append the Latex. So the report isn't only build at the end.
Conversion to PDF happens at the end.
Tables only have lib# in columns and are limited to 10 columns or what ever fits.
Fastq
- Fastqc (For now just this one)
Per base sequence quality
*Per sequence GC content
*Adapter Content
Trimming
- Detailed Log (Table?)
Fasta
- Size profiles
One graph per lib.
Possible pair-wise comparison depending on lib number. (I think two at most)
Filtering
- Detailed report of filtering (Grab that log and build table)
Genome
- (How low details not sure has spot)
End
- Stats per lib (for each )
- Counts per lib. (Issues with matrix size. Needs to be sized)