-
Notifications
You must be signed in to change notification settings - Fork 29
OWL NETS 2.0
Original Repository: OWL-NETS
Example Application: OWLNETS_Example_Application.ipynb
Purpose: OWL-NETS (NEtwork Transformation for Statistical learning) is a computational method that reversibly abstracts Web Ontology Language (OWL)-encoded biomedical knowledge into a more biologically meaningful network representation. OWL-NETS generates semantically rich knowledge graphs that contain heterogeneous nodes and edges and can be used for tasks that do not require OWL semantics.
Publication for V1.0:
Callahan TJ, Baumgartner WA, Bada M, Stefanski AL, Tripodi I, White EK, Hunter LE. OWL-NETS: Transforming OWL Representations for Improved Network Inference. Pac Symp Biocomput. 2018;23:133-144. PMID:29218876; PMCID:PMC5737627
OWL-NETS 2.0
: This wiki discusses an alternative and arguably more generalizable adaptation of the original project. This new version was developed as a fundamental component of the PheKnowLator project to decode OWL-encoded classes.
An ontology or knowledge graph built using OWL using contains two types of entities that we'd like to decode when transforming into an OWL-NETS
representation: (1) owl:Class
and (2) owl:Axiom
. While each of the components shown below is needed to build a semantically rich knowledge graph, the majority of the information used to construct each object is not biologically or clinically meaningful. Thus, the goal of the current algorithm is to decode all OWL-encoded classes and axioms (like those shown below) into something more clinically or biologically meaningful.
<!-- http://purl.obolibrary.org/obo/CL_0000995 -->
<owl:Class rdf:about="http://purl.obolibrary.org/obo/CL_0000995">
<owl:equivalentClass>
<owl:Class>
<owl:unionOf rdf:parseType="Collection">
<rdf:Description rdf:about="http://purl.obolibrary.org/obo/CL_0001021"/>
<rdf:Description rdf:about="http://purl.obolibrary.org/obo/CL_0001026"/>
</owl:unionOf>
</owl:Class>
</owl:equivalentClass>
<rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/CL_0001060"/>
</owl:Class>
http://purl.obolibrary.org/obo/CL_0000995
The OWL class CL_0000995
(i.e. CD34-positive, CD38-positive common myeloid progenitor OR CD34-positive, CD38-positive common lymphoid progenitor) was built by taking the union:
-
CL_0001021
(i.e. CD34-positive, CD38-positive common lymphoid progenitor) -
CL_0001026
(i.e. CD34-positive, CD38-positive common myeloid progenitor)
OWL-NETS would decode this class into:
CL_0001021, rdfs:subClassOf, CL_0000995
CL_0001026, rdfs:subClassOf, CL_0000995
CL_0000995, rdfs:subClassOf, CL_0001060
<!-- http://purl.obolibrary.org/obo/HP_0000340 -->
<owl:Class rdf:about="http://purl.obolibrary.org/obo/HP_0000340">
<owl:equivalentClass>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000051"/>
<owl:someValuesFrom>
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<rdf:Description rdf:about="http://purl.obolibrary.org/obo/PATO_0001481"/>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0000052"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/UBERON_0008200"/>
</owl:Restriction>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002573"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/PATO_0000460"/>
</owl:Restriction>
</owl:intersectionOf>
</owl:Class>
</owl:someValuesFrom>
</owl:Restriction>
</owl:equivalentClass>
<rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/HP_0000290"/>
</owl:Class>
http://purl.obolibrary.org/obo/HP_0000340
The owl class HP_0000340
(i.e. sloping forehead) was built by taking the intersection of:
-
PATO_0001481, RO_0000052, UBERON_0008200
(i.e. sloped, inheres in, forehead) -
PATO_0001481, RO_0002573, PATO_0000460
(i.e. sloped, has modifier, abnormal)
OWL-NETS would decode this class into:
HP_0000340, RO_0000086, PATO_0001481
HP_0000340, RO_0000052, UBERON_0008200
HP_0000340, RO_0002573, PATO_0000460
HP_0000340, rdfs:subClassOf, HP_0000290
<!-- http://purl.obolibrary.org/obo/GO_0000785 -->
<owl:Class rdf:about="http://purl.obolibrary.org/obo/GO_0000785">
<rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/GO_0110165"/>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000050"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/GO_0005694"/>
</owl:Restriction>
</rdfs:subClassOf>
</owl:Class>
http://purl.obolibrary.org/obo/GO_0000785
The OWL class GO_0000785
(i.e. chromatin) is a restricted to BFO_0000050
(i.e. part of) GO_0005694
(i.e. chromosome)
OWL-NETS would decode this class into:
GO_0000785, BFO_0000050, GO_0005694
GO_0000785, rdfs:subClassOf, GO_0110165
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/CL_0002004"/>
<owl:annotatedProperty rdf:resource="http://www.w3.org/2000/01/rdf-schema#subClassOf"/>
<owl:annotatedTarget rdf:resource="http://purl.obolibrary.org/obo/CL_0000547"/>
<oboInOwl:is_inferred rdf:datatype="http://www.w3.org/2001/XMLSchema#string">true</oboInOwl:is_inferred>
</owl:Axiom>
http://purl.obolibrary.org/obo/CL_0002004
The OWL class CL_0002004
(i.e. CD34-negative, GlyA-negative proerythroblast) has the following logical statements:
-
CL_0002004
SubClassOfUBERON_0002238
(CD34-negative, GlyA-negative proerythroblast subClassOf CD34-negative, GlyA-negative proerythroblast)
OWL-NETS would decode this axiom into:
CL_0002004, rdfs:subClassOf, CL_0000547
<owl:Axiom>
<owl:annotatedSource>
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<rdf:Description rdf:about="http://purl.obolibrary.org/obo/UBERON_0010757"/>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000050"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/NCBITaxon_9606"/>
</owl:Restriction>
</owl:intersectionOf>
<rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/UBERON_0002238"/>
</owl:Class>
</owl:annotatedSource>
<owl:annotatedProperty rdf:resource="http://www.w3.org/2000/01/rdf-schema#subClassOf"/>
<owl:annotatedTarget rdf:resource="http://purl.obolibrary.org/obo/UBERON_0002238"/>
<oboInOwl:source rdf:datatype="http://www.w3.org/2001/XMLSchema#string">FMA</oboInOwl:source>
</owl:Axiom>
http://purl.obolibrary.org/obo/UBERON_0010757
The OWL class UBERON_0010757
(i.e. rib 8) has the following logical statements:
-
UBERON_0010757
andBFO_0000050
someNCBITaxon_9606
(rib 8 part of Homo sapiens) -
UBERON_0010757
SubClassOfUBERON_0002238
(rib 8 subClassOf false rib)
OWL-NETS would decode this axiom into:
UBERON_0010757, BFO_0000050, NCBITaxon_9606
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/UBERON_0002373"/>
<owl:annotatedProperty rdf:resource="http://www.w3.org/2000/01/rdf-schema#subClassOf"/>
<owl:annotatedTarget>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002202"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/UBERON_0010023"/>
</owl:Restriction>
</owl:annotatedTarget>
</owl:Axiom>
http://purl.obolibrary.org/obo/UBERON_0002373
The OWL class UBERON_0002373
(i.e. Palantine tonsil) has the following logical statement:
-
UBERON_000556
RO_0002202
someUBERON_0010023
(palantine tonsil develops from dorsal paryngeal pouch 2)
OWL-NETS would decode this axiom into:
UBERON_0002373, RO_0002202, UBERON_0010023
<owl:Axiom>
<owl:annotatedSource>
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<rdf:Description rdf:about="http://purl.obolibrary.org/obo/UBERON_0005562"/>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000050"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/NCBITaxon_40674"/>
</owl:Restriction>
</owl:intersectionOf>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002254"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/UBERON_0010028"/>
</owl:Restriction>
</rdfs:subClassOf>
</owl:Class>
</owl:annotatedSource>
<owl:annotatedProperty rdf:resource="http://www.w3.org/2000/01/rdf-schema#subClassOf"/>
<owl:annotatedTarget>
<owl:Restriction>
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002254"/>
<owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/UBERON_0010028"/>
</owl:Restriction>
</owl:annotatedTarget>
<oboInOwl:notes rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Mammals</oboInOwl:notes>
<oboInOwl:source rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ISBN:0073040584-table13.1</oboInOwl:source>
</owl:Axiom>
http://purl.obolibrary.org/obo/UBERON_0005562
The OWL class UBERON_0005562
(i.e. Thymus primordium) has the following logical statements:
-
UBERON_0005562
andBFO_0000050
someNCBITaxon_40674
(thymus primordium part of mammalia) -
UBERON_0005562
SubClassOfRO_0002254
someUBERON_0010028
(thymus primordium has developmental contribution from ventral part of pharyngeal pouch 4)
OWL-NETS would decode this axiom into:
UBERON_0005562, RO_0002254, UBERON_0010028
UBERON_0005562, BFO_0000050, NCBITaxon_40674
The algorithm has three goals, each of which is further explained below:
- Decode all OWL-encoded classes
- Remove all triples that contain
subjects
,predicates
, and/orobjects
that are needed to ensure OWL semantics, but are not biologically meaningful - Ensure decoded knowledge graph contains a single connected component
- Purify the decoded knowledge graph to match an input knowledge graph construction approach (i.e.
subclass
orinstance
)
A high-level overview of the algorithm is provided in the snippet of the pseudocode below.
- Map
owl:Class
instances back to the originalowl:Class
- Remove all triples that do not contain a subject or object of type
BNode
orLiteral
- Keep triples containing any
owl:ObjectProperty
occurring with subject and objects that areowl:Class
orowl:NamedIndividual
Depending on the source ontology that you apply OWL-NETS
to, it's possible that the decoded knowledge graph may contain more than a single connected component. This step ensures that the decoded knowledge graph is connected.
- Derives a set of root nodes by searching for each node's highest ancestor concept (via
rdfs:subClassOf
).- If the node has no ancestors, all of the node's immediate neighbors are searched and the most frequently visited, highest common ancestor among the neighbors is selected. If none of the neighborhood concepts have any ancestors in common, a random ancestor concept is selected
- If the node has more than 1 neighbor, the highest ancestor concept is selected
- Each root node is then added to the graph as
rdfs:subClassOf
a user-provided URI. BFO_0000001 is the default choice
Currently, the program is configured to output the results from OWL-NETS
in two ways: (1) run the program as-is or (2) run the program as-is with an additional step to "purify" the output by ensuring that the resulting OWL-NETS
graph is completely consistent with the specified knowledge graph construction approach (i.e. subclass or instance-based). The "purified" output will include _SUBCLASS_purified_
or _INSTANCE_purified_
in the file names.
The procedure utilized to "purify" the graph is as follows:
- Subclass Construction Approach:
- Find all triples containing
rdf:type
(subj
rdf:type
obj
)- Replace
rdf:type
withrdfs:subClassOf
- Make
subj
rdfs:subClassOf
all ancestors ofobj
- Replace
- Find all triples containing
- Instance Construction Approach:
- Find all triples containing
rdfs:subClassOf
(subj
rdfs:subClassOf
obj
)- Replace
rdfs:subClassOf
withrdf:type
- Make
subj
rdf:type
all ancestors ofobj
- Replace
- Find all triples containing
ASSUMPTIONS:
Don't Decode
- Classes built using the
owl:complementOf
constructors - Triples containing annotations
- Triples that contain
oneOf
(e.g.IAO_0000225
) - Triples containing properties signifying negation
ObjectProperty
orowl:Class
(e.g.lacks_part
,disjointWith
)
Decode
- The following property types:
someValuesFrom
,onClass
,hasSelf
,hasValue
,allValuesFrom
- Triples containing cardinality constraints, but ignore cardinality
To determine owl:ObjectProperties
in decoded owl:intersectionOf
or owl:unionOf
constructors:
-
RO_0000086
(has quality): If subject is NOT a PATO term and object IS a PATO term - Provided
onProperty
: If both subject and object ARE PATO terms AND there is anonProperty
provided -
rdfs:subClassOf
(subclass
build) /rdf:type
(instance
build):- If both subject and object ARE PATO terms AND there is not an
onProperty
- If both subject and object ARE NOT PATO terms AND there is not an
onProperty
- If both subject and object ARE PATO terms AND there is not an
Inputs and Outputs:
-
Input Data:
- A Networkx MultiDigraph
- An RDFLib Graph
- A
filepath
andfilename
to write output to
-
Output Data:
- A Networkx MultiDigraph
- An RDF graph containing all of the owl-encoded (Step 1) and triples containing OWL semantics (Step 2) serialized in
nt
format - A Hash Map Storing Transformation information:
{'owl_nets': { 'decoded_classes': {}, 'complementOf': {}, 'cardinality': {}, 'negation': {}, 'misc': {}}, 'disjointWith': {}, 'filtered_triples': set(), '<<knowledge construction approach>>_approach_purified': set()}
Jupyter Notebook: OWLNETS_Example_Application.ipynb
To run OWL-NETS
on a graph or ontology without running pkt_kg
you need to provide: (1) fork or clone the PheKnowLator
GitHub repository; (2) provide an RDFLib Graph()
object or file path to the object you want to transform; (3) provide a path to where the output should be written; and (4) provide a filename (i.e. owl_nets_output
). From the PheKnowLator
directory run the following code:
from rdflib import Graph
from pkt_kg.owlnets import OwlNets
# load ontology
hp_graph = Graph().parse('path/to/file/hp.owl')
# instantiate class
owl_nets = OwlNets(graph=hp_graph, write_location='resources/', filename='/hpo_test')
# run the method
owl_nets.run_owl_nets()