Skip to content

Commit d6dd82a

Browse files
authored
Merge pull request #5 from MattOates/feature/go_terms
Feature/go terms
2 parents 51be4f8 + ddeaed0 commit d6dd82a

File tree

11 files changed

+117
-42
lines changed

11 files changed

+117
-42
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -29,3 +29,4 @@ nytprof.out
2929
pm_to_blib
3030
pod2htm*.tmp
3131
fatlib/
32+
.idea/

.travis.yml

+1-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
language: perl
22
perl:
3-
- "5.24"
4-
- "5.22"
3+
- "5.36"
54
- "5.20"
65

76
sudo: false

CHANGES.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
###2019-05-17
22
* Updated timeout so that downloads from ELM.eu.org are more likely to succeed.
3-
* Cut a new maintenance release 1.4.2
3+
* Cut a new maintenance release 1.4.3
44

55
###2016-07-05
66
* Majority of refactor complete

README.md

+13-4
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
mELM [![GitHub version](https://badge.fury.io/gh/MattOates%2Fmelm.svg)](https://badge.fury.io/gh/MattOates%2Fmelm)[![Build Status](https://travis-ci.org/MattOates/melm.svg?branch=master)](https://travis-ci.org/MattOates/melm)[![Coverage Status](https://coveralls.io/repos/github/MattOates/melm/badge.svg?branch=master)](https://coveralls.io/github/MattOates/melm?branch=master)
1+
mELM [![GitHub version](https://badge.fury.io/gh/MattOates%2Fmelm.svg)](https://badge.fury.io/gh/MattOates%2Fmelm)[![Build Status](https://travis-ci.com/MattOates/melm.svg?branch=master)](https://travis-ci.com/MattOates/melm)[![Coverage Status](https://coveralls.io/repos/github/MattOates/melm/badge.svg?branch=master)](https://coveralls.io/github/MattOates/melm?branch=master)
22
==================
33

4-
[⬇️ Download latest release v1.4.2](https://github.com/MattOates/melm/releases/download/v1.4.2/melm)
4+
[⬇️ Download latest release v1.4.3](https://github.com/MattOates/melm/releases/download/v1.4.3/melm)
55

66
mELM is a tool for masking or assigning Eukaryotic Linear Motifs to protein sequences. Both TSV/GFF3 output or FASTA is possible.
77
Essentially the tool is a CLI to the ELM.eu.org online resource with additional tools for dealing with short motif assignment within disordered regions.
@@ -32,7 +32,7 @@ If you have used mELM with ANCHOR predictions please cite the following:
3232
Usage
3333
=====
3434

35-
melm [h,v,u,U,c,i,a,G,E,X,m,n,C,P,l,t,M,D,d] <SEQ FILES...>
35+
melm [h,v,u,U,c,i,a,G,E,X,m,n,C,P,l,t,M,D,d,g] <SEQ FILES...>
3636
-h, --help
3737
this message
3838
-v, --verbose
@@ -71,6 +71,10 @@ Usage
7171
if ANCHOR is installed use that to filter ELM output based on ANCHOR's IUPred disorder prediction, only include ELMs that fall within disordered regions
7272
-d <>, --anchor-datapath=<>
7373
provide the location of the anchor data path, mELM will otherwise assume it's in the same directory as your anchor binary
74+
-g <>, --go-filter=<>
75+
turn on GO filtering, only show results for ELM classes that have been associated with the GO ID specified
76+
-o <>, --organism-filter=<>
77+
turn on organism filtering, only show results for ELM instances that have been observed in a given organism
7478

7579
Example Use Cases
7680
=================
@@ -91,6 +95,11 @@ Get a GFF3 file for a whole genomes worth of protein annotations
9195

9296
melm --assign --GFF3 human_proteins.fa > human_motifs.gff3
9397

98+
Get another GFF3 file but this time be strict on assignment to those active in native disordered state from the nucleus
99+
using GO filtering for GO ID 0005634 "nucleus"
100+
101+
melm --assign --GFF3 --logic-filter --disorder-filter --go-filter=0005634 --organism-filter=sapiens human_proteins.fa > disordered_hiqual_human_nucleus_motifs.gff3
102+
94103
Get the latest ELM classes library for use in another script or by yourself
95104

96105
melm --update --list-classes
@@ -103,7 +112,7 @@ License
103112
=======
104113

105114
melm - Mask and assign ELM motifs in protein sequence libraries
106-
(C) 2014-2019 Dr Matt E. Oates
115+
(C) 2014-2022 Dr Matt E. Oates
107116

108117
This program is free software: you can redistribute it and/or modify
109118
it under the terms of the GNU Affero General Public License as

bin/melm

+31-8
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
#!/usr/bin/env perl
2+
use utf8;
23
=head1 NAME
34
45
I<melm> - mELM masking of assigned ELM motifs
@@ -20,7 +21,7 @@ I<melm> - mELM masking of assigned ELM motifs
2021
get TSV output of all the ELM instances held in the melm cache
2122
-a, --assign
2223
do not mask sequences, instead output a TSV format of all the ELM assignments made per sequence
23-
-G, -GFF3
24+
-G, --GFF3
2425
produce GFF3 output when using --assign, useful if you wish to add this assignment to a genome browser or similar
2526
-E <>, --max-class-expect=<>
2627
filter out ELM classes based on their annotated expectation, bigger E means allow for more common motifs
@@ -44,6 +45,10 @@ I<melm> - mELM masking of assigned ELM motifs
4445
if ANCHOR is installed use that to filter ELM output based on ANCHOR's IUPred disorder prediction, only include ELMs that fall within disordered regions
4546
-d <>, --anchor-datapath=<>
4647
provide the location of the anchor data path, mELM will otherwise assume it's in the same directory as your anchor binary
48+
-g <>, --go-filter=<>
49+
turn on GO filtering, only show results for ELM classes that have been associated with the GO ID specified
50+
-o <>, --organism-filter=<>
51+
turn on organism filtering, only show results for ELM instances that have been observed in a given organism
4752
4853
=head1 DESCRIPTION
4954
@@ -88,9 +93,9 @@ Get a GFF3 file for a whole genome's worth of protein annotations
8893
8994
melm --assign --GFF3 human_proteins.fa > human_motifs.gff3
9095
91-
Get another GFF3 file but this time be strict on assignment to those active in native disordered state
96+
Get another GFF3 file but this time be strict on assignment to those active in native disordered state from the nucleus
9297
93-
melm --assign --GFF3 --logic-filter --disorder-filter human_proteins.fa > disordered_hiqual_human_motifs.gff3
98+
melm --assign --GFF3 --logic-filter --disorder-filter --go-filter=0005634 --organism-filter=sapiens human_proteins.fa > disordered_hiqual_human_nucleus_motifs.gff3
9499
95100
Get the latest ELM classes library for use in another script or by yourself
96101
@@ -107,7 +112,7 @@ B<Matt Oates> - I<[email protected]>
107112
=head1 LICENSE
108113
109114
melm - Mask and assign ELM motifs in protein sequence libraries
110-
(C) 2014-2019 Dr Matt E. Oates
115+
(C) 2014-2022 Dr Matt E. Oates
111116
112117
This program is free software: you can redistribute it and/or modify
113118
it under the terms of the GNU Affero General Public License as
@@ -124,9 +129,12 @@ B<Matt Oates> - I<[email protected]>
124129
125130
=head1 EDIT HISTORY
126131
132+
2022-06-10
133+
* Added GO data support and bumped the version for release
134+
127135
2019-05-17
128136
* Updated timeout so that downloads from ELM.eu.org are more likely to succeed.
129-
* Cut a new maintenance release 1.4.2
137+
* Cut a new maintenance release 1.4.3
130138
131139
2016-07-05 - Matt Oates
132140
* Majority of refactor complete
@@ -181,7 +189,7 @@ use ELM;
181189
use ELM::Utils 'get_www';
182190

183191
#Current version of the script
184-
our $VERSION = "v1.4.2";
192+
our $VERSION = "v1.4.3";
185193

186194
#User options
187195
my $help;
@@ -205,8 +213,10 @@ my $disorder_filter;
205213
my $type;
206214
my $gff;
207215
my $anchor_datapath;
216+
my $go_filter;
217+
my $organism_filter;
208218

209-
#Flags used h,v,u,U,c,i,a,G,E,X,m,n,C,P,l,t,M,D,d
219+
#Flags used h,v,u,U,c,i,a,G,E,X,m,n,C,P,l,t,M,D,d,g
210220
GetOptions(
211221
"help|h!" => \$help,
212222
"verbose|v!" => \$verbose,
@@ -227,14 +237,16 @@ GetOptions(
227237
"morf-filter|M!" => \$morf_filter,
228238
"disorder-filter|D!" => \$disorder_filter,
229239
"anchor-datapath|d=s" => \$anchor_datapath,
240+
"go-filter|g=s" => \$go_filter,
241+
"organism-filter|g=s" => \$organism_filter,
230242
) or die "Fatal Error: Problem parsing command-line ".$!;
231243

232244
my @fasta_files = @ARGV;
233245

234246
#Print out some help if it was asked for or if no arguments were given.
235247
pod2usage(-exitstatus => 0, -verbose => 2) if $help;
236248

237-
pod2usage(-exitstatus => 0, -verbose => 1, -msg => "mELM version $VERSION by Matt Oates (C) 2014-2019. Please provide some sequence files to mask or assign ELM motifs to.")
249+
pod2usage(-exitstatus => 0, -verbose => 1, -msg => "mELM version $VERSION by Matt Oates (C) 2014-2022. Please provide some sequence files to mask or assign ELM motifs to.")
238250
unless $update or $upgrade or $list_classes or $list_instances or scalar @fasta_files >= 1;
239251

240252
my $elm = ELM->new(
@@ -245,6 +257,8 @@ my $elm = ELM->new(
245257
morf_filter => $morf_filter,
246258
disorder_filter => $disorder_filter,
247259
logic_filter => $logic_filter,
260+
go_filter => $go_filter,
261+
organism_filter => $organism_filter,
248262
num_elms_threshold => $num_elms,
249263
anchor => ELM::Anchor->new(anchor_datapath => $anchor_datapath)
250264
);
@@ -274,6 +288,15 @@ if ($upgrade) {
274288
exit;
275289
}
276290

291+
if ($elm->go_filter and not $elm->library->go_terms_version) {
292+
say STDERR "You're trying to use a GO filter with no cached GO data, trying to update to fetch GO data";
293+
$elm->library->update();
294+
if (not $elm->library->go_terms_version) {
295+
say STDERR "Something appears to have failed with fetching GO data. Cannot proceed.";
296+
exit;
297+
}
298+
}
299+
277300
#Update the ELM classes file and populate elms
278301
if ($update or not $elm->library->exists) {
279302
$elm->library->update();

lib/ELM.pm

+23-4
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
1-
package ELM v1.4.2;
1+
package ELM v1.4.3;
22
=encoding UTF-8
33
=head1 NAME
44
55
ELM - Class to do analysis with the ELM regex library
66
77
=head1 VERSION
88
9-
Version v1.4.2
9+
Version v1.4.3
1010
1111
=cut
1212

@@ -29,7 +29,9 @@ use Class::Tiny qw(
2929
max_elm_probability
3030
morf_filter
3131
disorder_filter
32-
logic_filter), {
32+
logic_filter
33+
go_filter
34+
organism_filter), {
3335
num_elms_threshold => sub { 1 },
3436
library => sub { ELM::Library->new() },
3537
anchor => sub { ELM::Anchor->new() },
@@ -116,6 +118,12 @@ sub assign($self, $elm_name, $regex, $string, $morf_regions, $dis_regions) {
116118
if ($self->disorder_filter) {
117119
next unless any_overlap($start,$end,$dis_regions);
118120
}
121+
if ($self->go_filter and $self->library->go_terms_version) {
122+
next unless $self->_go_filter_ok($elm_name);
123+
}
124+
if ($self->organism_filter) {
125+
next unless $self->_organism_filter_ok($elm_name);
126+
}
119127
push @ret, [$elm_name, $start, $end, $seq, $prob, $entropy, $entrorate];
120128
}
121129
if (@ret < 1) {
@@ -172,6 +180,17 @@ sub _logic_filter_ok($self, $elm_name, $seq, %opt) {
172180
return (not exists $filters{$seq})?1:0;
173181
}
174182

183+
sub _go_filter_ok($self, $elm_name) {
184+
my %elms = %{ $self->library->elms };
185+
return 0 != scalar grep {$self->go_filter eq $_->{go_id}} @{$elms{$elm_name}{go_terms}};
186+
}
187+
188+
sub _organism_filter_ok($self, $elm_name) {
189+
# At least one true positive (TP) example from a substr match to the organism text must exit
190+
my %elms = %{ $self->library->elms };
191+
return 0 != scalar grep {$_->{logic} eq 'TP' and index(fc $_->{organism}, fc $self->organism_filter ) != -1} @{$elms{$elm_name}{instances}};
192+
}
193+
175194
=head1 AUTHOR
176195
177196
Matt Oates, C<< <mattoates at gmail.com> >>
@@ -219,7 +238,7 @@ If you have used mELM with ANCHOR predictions please cite the following:
219238
220239
=head1 LICENSE AND COPYRIGHT
221240
222-
Copyright 2019 Matt Oates.
241+
Copyright 2022 Matt Oates.
223242
224243
This program is free software: you can redistribute it and/or modify
225244
it under the terms of the GNU Affero General Public License as

lib/ELM/AminoAcids.pm

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
package ELM::AminoAcids v1.4.2;
1+
package ELM::AminoAcids v1.4.3;
22
require Exporter;
33
=encoding UTF-8
44
=head1 NAME
@@ -7,7 +7,7 @@ ELM::AminoAcids - Functions for dealing with amino acid specific calculations
77
88
=head1 VERSION
99
10-
Version v1.4.2
10+
Version v1.4.3
1111
1212
=cut
1313

@@ -142,7 +142,7 @@ If you have used mELM with ANCHOR predictions please cite the following:
142142
143143
=head1 LICENSE AND COPYRIGHT
144144
145-
Copyright 2019 Matt Oates.
145+
Copyright 2022 Matt Oates.
146146
147147
This program is free software: you can redistribute it and/or modify
148148
it under the terms of the GNU Affero General Public License as

lib/ELM/Anchor.pm

+5-7
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
1-
package ELM::Anchor v1.4.2;
1+
package ELM::Anchor v1.4.3;
22
=encoding UTF-8
33
=head1 NAME
44
55
ELM::Anchor - Class to wrap ANCHOR and get assignments
66
77
=head1 VERSION
88
9-
Version v1.4.2
9+
Version v1.4.3
1010
1111
=cut
1212

@@ -44,11 +44,9 @@ To create an ELM::Anchor explicitly.
4444
Get/Set the ANCHOR datapath, default to install directory
4545
4646
=cut
47-
sub anchor_datapath($self) {
47+
sub anchor_datapath($self, $path) {
4848
my $defaults = Class::Tiny->get_all_attribute_defaults_for( ref $self );
49-
if (@_) {
50-
my $path = shift;
51-
$path = $defaults->{anchor_datapath}->() unless $path;
49+
if ($path) {
5250
return $self->{anchor_datapath} = $path;
5351
}
5452
elsif ( exists $self->{anchor_datapath} ) {
@@ -194,7 +192,7 @@ If you have used mELM with ANCHOR predictions please cite the following:
194192
195193
=head1 LICENSE AND COPYRIGHT
196194
197-
Copyright 2019 Matt Oates.
195+
Copyright 2022 Matt Oates.
198196
199197
This program is free software: you can redistribute it and/or modify
200198
it under the terms of the GNU Affero General Public License as

lib/ELM/Calc.pm

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
package ELM::Calc v1.4.2;
1+
package ELM::Calc v1.4.3;
22
require Exporter;
33
=encoding UTF-8
44
=head1 NAME
@@ -7,7 +7,7 @@ ELM::Calc - Functions for calculating sequence assignment specific tasks
77
88
=head1 VERSION
99
10-
Version v1.4.2
10+
Version v1.4.3
1111
1212
=cut
1313

@@ -138,7 +138,7 @@ If you have used mELM with ANCHOR predictions please cite the following:
138138
139139
=head1 LICENSE AND COPYRIGHT
140140
141-
Copyright 2019 Matt Oates.
141+
Copyright 2022 Matt Oates.
142142
143143
This program is free software: you can redistribute it and/or modify
144144
it under the terms of the GNU Affero General Public License as

0 commit comments

Comments
 (0)