- Adding Opentargets DRUG-TARGET data to the test data
- For this example we are using the config files in
test/config/
e.g.test/config/data_integration.yml
andtest/config/db_schema.yml
. In practice you would create new versios of these and put them inconfig/
- In production, changes should not be made to the
test
directory.
git checkout -b dev-$USER
Copy example.env
to .env
and edit
- see Create .env file
- this can be done on a remote server, just need to add
SERVER_NAME
name to.env
- need to copy the created data to the specified
DATA_DIR
using the functioncopy_source_data()
Example:
python -m test.scripts.source.get_opentargets
For example, uncomment the following sections in test/config/data_integration.yml
and check the filenames in files:
match the name of the source data, e.g. today's date.
- Node
drug-ot:
name: Drug
files:
drug-target: opentargets/open_targets_2020-10-19.csv
script: nodes.drug.opentargets
source: Opentargets-2020-10-19
- Relationship
ot-drug-target:
name: OPENTARGETS_DRUG_TO_TARGET
files:
drug-target: opentargets/open_targets_2020-10-19.csv
script: rels.opentargets_drug_target
source: Opentargets-2020-08-24
- Node
Drug:
properties:
id:
type: string
label:
type: string
molecule_type:
type: string
required:
- label
- id
index: label
meta:
_id: id
_name: label
- Relationship
OPENTARGETS_DRUG_TO_TARGET:
properties:
source:
type: Drug
target:
type: Gene
action_type:
type: string
phase:
type: string
required:
- source
- target
- phase
- action_type
Note, if just testing out the demo data, this step can be skipped.
- if new node type make a new directory, e.g.
mkdir workflow/scripts/nodes/drug
- all property values for both nodes and relationships should have no spaces
To access source files specified in data_integration.yml
use the key/value pairs, e.g.
FILE = get_source(meta_id,1)
To process the final dataframe
create_import(df=df, meta_id=meta_id)
To add constraints, e.g. Neo4j property indexes
constraintCommands = ["CREATE index on :Drug(label);"]
create_constraints(constraintCommands, meta_id)
python -m test.scripts.processing.nodes.drug.opentargets -n drug-ot
python -m test.scripts.processing.rels.opentargets_drug_target -n ot-drug-target
- can also use local data, e.g. not in
DATA_DIR
python -m test.scripts.processing.nodes.drug.opentargets -n drug-ot -r /path/to/local/data
open test/results/graph_data/0.0.1/nodes/drug-ot/drug-ot.profile.html
snakemake -r check_new_data -j 10
https://github.com/elswob/neo4j-build-pipeline/actions