Skip to content

Commit

Permalink
RFC79: Incremental Upload of Data Entries (#48)
Browse files Browse the repository at this point in the history
* Add clinical_attribute_meta records to the seed mini

To make the dataset look like real data in the database

* Implement sample attribute rewriting flag

* Add --overwrite-existing for the rest of test cases

Apperently, the flag does not change anything.
But we add it anyway as the tests for "incremental" data upload.

* Test that mutations stay after updating the sample attributes

* Add overwrite-existing support for mutations data

* Fix --overwirte-existing flag description for importer of profile data

* Add loader command to update case list with sample ids

adding to the all case list and case list specified with command arguments is supported

* Add option to remove sample ids from the remaining case lists

From case lists that is not _all case list and not specified with --add-to-case-lists option

* Make removing sample ids from not mentioned case lists a default behaviour

* Make update case list command to read case lists files

* Fix test clinical data headers

* Test incremental patient upload

* Add flag to reload patient clinical attributes

* Add TODO comment to remove MIXED_ATTRIBUTES data type

with a reference to the ticket

* WIP adopt py script to incremental upload

* Fix java.sql.SQLException: Generated keys not requested

* Clean alteration_driver_annotation during mutations inc. upload

* Fix validator and importer py scripts for inc. upload

* Add test/demo data for incremental loading of study_es_0 study

* Rename and move incremental tests to incementalTest folder

* Update TODO comment how to deal with multiple sample files

* Move study_es_0_inc to the new test data folder

* Fix removing patient attributes on samples inc. upload

* Change study_es_0_inc to contain more diverse data

We changed them to work for the demo.
Mutation numbers did not change on demo.

* Specify that data_directory for incremental data

* Disambiguate clinical data constants names

Not it was easy to be confused where sample and clinical_sample (attributes),
patient and clinical_patient (attributes) related code

* Remove not necessary TODO comments

* Remove MSK copyright mistakenly copy-pasted

* Fix comment of UpdateCaseListsSampleIds.run() method

* Make --overwrite-existing flag description more generic

This flag for command to upload molecular profile data

* Add TODO comments for possible reuse of the code

* Update case lists for multiple clinical sample files

Potentially for different studies

* Extract and reuse common logic to read and validate case lists

* Fix TestIntegrationTest

- change location of the files
- make sure assertions could work on the seed mini db
- get rid from absent cbioportal dependencies

* Revert RESOURCE_DEFINITION_DICTIONARY initialsation to empty set

* Minor improvments. Apply PRs feedback

* Make tests fail the build. Conduct exit status of tests correctly

* Write Validation complete only in case of successful validation

* Add python tests for incremental/full data import

* Add unit test for incremental data validation

* Test rough order of importer commands. Remove sorting in the script to guarantee that

* Extract smaller functions from the big one in py script

Make process_data_directory(...) smaller

* Refactor tab delim. data importer

- Calculate number of lines in the file in the loader
- Remove unused imports and fields
- Reuse constructors
- Reuse common parsing logic in tab delimiter importer
- Show full stacktrace which helps in dinding where tests errored out

* Implement incremental upload of mRNA data

* Add RPPA test

* Add normal sample to thest data to test skipping

* Add rows with more columns then in header to skip

* Skip rows that don't have enough sample columns

* Test for invalid entrez id

* Extract common code from inc. tab. delim. tests

* Implement incremntal upload of cna data via tab. delim. loader

* Blanken values for genes not mentioned in the file

* Remove unused code

* Throw unsupported operation exception for GENESET_SCORE incremental upload

* Add generic assay data incremental upload test

* Fix integration tests

* Make tab. delimiter data uploader transactional

* Check for illegal state in tab delim. data update

It's dangerous as we would further mess up the data in the row

* Wire incremental tab delim. data upload to cli commands

* Expand README with section on how to run incremental upload

* Address TODOs in tab delim. importer

* Add more data types to incremental data upload folder

* Remove obsolete TODO comment

* Reuse genetic_profile record if it exists in db already

Do it for all data types, not only MAF

* Test incremental upload of tab delim. data types from umbrella script

- Split big tab. delim test to multiple tests based on data type.

- Use ImportProfileData instead of ImportTabDelimData for testing.
  - We cover more logic with such tests.
  - This is more stable interface. ImportTabDelimData can be refactored.

* Move counting lines if file inside generic assay patient level data uploader

* Give error that  generic asssay patient level data is not supported

* Clean sample_cna_event despite whether it has alteration_driver_annotation rows or not

* Fix cbioportalImport script execution

args variable was not declared

* Remove not needed spring context initialisation

that caused different errors to occur

* Make error message more informative when gene panel is not found

Do not throw NPE, but NSEE with error message that mentions panel id

* Add more genes to the mini seed to load study_es_0

* Make study_es_0_inc data pass validation

* Document in README how to load study_es_0 study

* Implement incremental upload for timeline data

* Implement incremental upload of CNA DISCRETE long data

* Add data type sanity check for tsv uploded

* Move storing/dedup logic of genetic alteration values to importer

* Move all inc. upload logic for tab delim. data types to GeneticAlterationImporter

* Add CNA DISCRETE LONG to study_es0_inc test dataset

* Remove unused code

* Make validation to pass for CNA long and study_es_0_inc data

* Implement incremental upload for gene panel matrix

The uploader was working in incremental manner already.
I had to add tests for those only.
I had to implement incremental upload for gene panel matrix
from differend data (CNA, Mutations) uploaders though.

* Make validation of study_es_0_inc data to pass

* Implement incremental upload of structural variants data

I removed DaoGeneticProfileSamples.addGeneticProfileSamples(geneticProfileId, orderedSampleList);
as it does not seem to be needed.
it does not make any sense to store samples in genetic_profile_samples, if you don't use genetic_alteration table at all.

* Implement incremental upload of CNA segmented data

* Make it explicit that timeline uploader support bulk mode only

* Fix number of columns in SV tsv data file

* Update paragraph on inc. upload in README

* Rename validation method to better describe it's purpose

To really validate entrez id, we need to look it up

* Fix cleaning alteration_driver_annotation table for specific sample

* DRY tab separated value string parsing

* Reuse FileUtil.isInfoLine(String line) throughout the code

* Extract ensuring header and row match to tsv utility class

* Simplify delete sql. Rely on cascade delete instead.

* Generalise overwrite-existing flag description to make it more accurate

* Rename updateMode to isIncrementalUpdateMode flag

* Improve description of overwrite-existing flag for gene panel profile map

* Implement more optimal way to update sample profile

* Optimize code by always using batch upsert for sample profile

* Recognise that SEG importer always use bulkLoad

* Organise bulk mode flushing for SEG importer

* Ignore case for bulkLoad load mode option as everywhere in the code

* add comma to README

* improve order comments for INCREMENTAL_UPLOAD_SUPPORTED_META_TYPES

* Add join by GENETIC_PROFILE_ID column for sample_cna_event and alteration_driver_annotaiton tables

* Check for inconsistency in sample ids and values while reading genetic alterations

* Make method name to initialise transaction clearer

* Remove TODOs that were done

* Rename isInfoLine util. method to isDataLine

I got feedback that "info line" sounds like the header metadata lines starting with #

* Simplify code by using inheritence instead of composition

* Optimize removing genetic alterations

by removing them for the whole genetic profile at once.

one sql statment instead of N

* Access inherited variables with this. intead of super.

the confusion that triggered the change: The use of super. indicates that the subclass also declares one with the same name, but you are trying to not set that somehow?

* Remove unused code from DaoSampleList.addSampleList()

* Remove extra semicolons at the end of java statements

* Rename upsertSampleProfiles to upsertSampleToProfileMapping

method in DaoSampleProfile

* Use java 8 way to convert typed list to array in GeneticAlterationIncrementalImporter

* Improve doc comments for  TsvUtil.isDataLine(String line)

* Rename and codument better method to updateCaseLists

* Remove DEFINED_CANCER_TYPES global variable

* Add docstring to sample attribute remove methods

Make it explicity that function will delete any matching records "if they exist"

* Add docstring to method to update fraction genome altered clinical attribute

Specify that sampleIds is optional and can be set to null

* Make DAO contant that hold SQL private

increase incapsulation

* Stop doing rows math, it's just a status!

* Adopt C style of incrementing jdbc paramters

* Improve wording in error message

* Remove unused method of genetic alteration importer

* Extract db communicating methods out of the constructor

introduce initialise() method

* Improve time complexity from N^2 to N

* Use american english for method names

---------

Co-authored-by: pieterlukasse <[email protected]>
  • Loading branch information
forus and pieterlukasse authored Jul 16, 2024
1 parent efcc1d2 commit e7cfb7b
Show file tree
Hide file tree
Showing 181 changed files with 5,468 additions and 1,394 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/validate-python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
- name: 'Validate tests'
working-directory: ./cbioportal-core
run: |
docker run -v ${PWD}:/cbioportal-core python:3.6 /bin/bash -c '
docker run -v ${PWD}:/cbioportal-core python:3.6 /bin/sh -c '
cd cbioportal-core &&
pip install -r requirements.txt &&
source test_scripts.sh'
./test_scripts.sh'
67 changes: 54 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,59 @@ This repo contains:
## Inclusion in main codebase
The `cbioportal-core` code is currently included in the final Docker image during the Docker build process: https://github.com/cBioPortal/cbioportal/blob/master/docker/web-and-data/Dockerfile#L48

## Running in docker

Build docker image with:
```bash
docker build -t cbioportal-core .
```

### Example of how to load `study_es_0` study

Import gene panels

```bash
docker run -it -v $(pwd)/tests/test_data/:/data/ -v $(pwd)/application.properties:/application.properties cbioportal-core \
perl importGenePanel.pl --data /data/study_es_0/data_gene_panel_testpanel1.txt
docker run -it -v $(pwd)/tests/test_data/:/data/ -v $(pwd)/application.properties:/application.properties cbioportal-core \
perl importGenePanel.pl --data /data/study_es_0/data_gene_panel_testpanel2.txt
```

Import gene sets and supplementary data

```bash
docker run -it -v $(pwd)/src/test/resources/:/data/ -v $(pwd)/application.properties:/application.properties cbioportal-core \
perl importGenesetData.pl --data /data/genesets/study_es_0_genesets.gmt --new-version msigdb_7.5.1 --supp /data/genesets/study_es_0_supp-genesets.txt
```

Import gene set hierarchy data

```bash
docker run -it -v $(pwd)/src/test/resources/:/data/ -v $(pwd)/application.properties:/application.properties cbioportal-core \
perl importGenesetHierarchy.pl --data /data/genesets/study_es_0_tree.yaml
```

Import study

```bash
docker run -it -v $(pwd)/tests/test_data/:/data/ -v $(pwd)/application.properties:/application.properties cbioportal-core \
python importer/metaImport.py -s /data/study_es_0 -p /data/api_json_system_tests -o
```

### Incremental upload of data

To add or update specific patient, sample, or molecular data in an already loaded study, you can perform an incremental upload. This process is quicker than reloading the entire study.

To execute an incremental upload, use the -d (or --data_directory) option instead of -s (or --study_directory). Here is an example command:
```bash
docker run -it -v $(pwd)/data/:/data/ -v $(pwd)/application.properties:/application.properties cbioportal-core python importer/metaImport.py -d /data/study_es_0_inc -p /data/api_json -o
```
**Note:**
While the directory should adhere to the standard cBioPortal file formats and study structure, incremental uploads are not supported for all data types though.
For instance, uploading study metadata, resources, or GSVA data incrementally is currently unsupported.

This method ensures efficient updates without the need for complete study reuploads, saving time and computational resources.

## How to run integration tests

This section guides you through the process of running integration tests by setting up a cBioPortal MySQL database environment using Docker. Please follow these steps carefully to ensure your testing environment is configured correctly.
Expand Down Expand Up @@ -78,7 +131,7 @@ After you are done with the setup, you can build and test the project.

1. Execute tests through the provided script:
```bash
source test_scripts.sh
./test_scripts.sh
```

2. Build the loader jar using Maven (includes testing):
Expand Down Expand Up @@ -119,15 +172,3 @@ The script will search for `core-*.jar` in the root of the project:
python scripts/importer/metaImport.py -s tests/test_data/study_es_0 -p tests/test_data/api_json_unit_tests -o
```

## Running in docker

Build docker image with:
```bash
docker build -t cbioportal-core .
```

Example of how to start the loading:
```bash
docker run -it -v $(pwd)/data/:/data/ -v $(pwd)/application.properties:/application.properties cbioportal-core python importer/metaImport.py -s /data/study_es_0 -p /data/api_json -o
```

3 changes: 3 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,9 @@
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.21.0</version>
<configuration>
<trimStackTrace>false</trimStackTrace>
</configuration>
<executions>
<execution>
<id>default-test</id>
Expand Down
155 changes: 122 additions & 33 deletions scripts/importer/cbioportalImporter.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import logging
import re
from pathlib import Path
from typing import Dict, Tuple

# configure relative imports if running as a script; see PEP 366
# it might passed as empty string by certain tooling to mark a top level module
Expand Down Expand Up @@ -39,6 +40,8 @@
from .cbioportal_common import ADD_CASE_LIST_CLASS
from .cbioportal_common import VERSION_UTIL_CLASS
from .cbioportal_common import run_java
from .cbioportal_common import UPDATE_CASE_LIST_CLASS
from .cbioportal_common import INCREMENTAL_UPLOAD_SUPPORTED_META_TYPES


# ------------------------------------------------------------------------------
Expand Down Expand Up @@ -101,8 +104,17 @@ def remove_study_id(jvm_args, study_id):
args.append("--noprogress") # don't report memory usage and % progress
run_java(*args)

def update_case_lists(jvm_args, meta_filename, case_lists_file_or_dir = None):
args = jvm_args.split(' ')
args.append(UPDATE_CASE_LIST_CLASS)
args.append("--meta")
args.append(meta_filename)
if case_lists_file_or_dir:
args.append("--case-lists")
args.append(case_lists_file_or_dir)
run_java(*args)

def import_study_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity = None, meta_file_dictionary = None):
def import_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity = None, meta_file_dictionary = None, incremental = False):
args = jvm_args.split(' ')

# In case the meta file is already parsed in a previous function, it is not
Expand Down Expand Up @@ -133,6 +145,10 @@ def import_study_data(jvm_args, meta_filename, data_filename, update_generic_ass
importer = IMPORTER_CLASSNAME_BY_META_TYPE[meta_file_type]

args.append(importer)
if incremental:
if meta_file_type not in INCREMENTAL_UPLOAD_SUPPORTED_META_TYPES:
raise NotImplementedError("This type does not support incremental upload: {}".format(meta_file_type))
args.append("--overwrite-existing")
if IMPORTER_REQUIRES_METADATA[importer]:
args.append("--meta")
args.append(meta_filename)
Expand Down Expand Up @@ -212,11 +228,20 @@ def process_command(jvm_args, command, meta_filename, data_filename, study_ids,
else:
raise RuntimeError('Your command uses both -id and -meta. Please, use only one of the two parameters.')
elif command == IMPORT_STUDY_DATA:
import_study_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity)
import_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity)
elif command == IMPORT_CASE_LIST:
import_case_list(jvm_args, meta_filename)

def process_directory(jvm_args, study_directory, update_generic_assay_entity = None):
def get_meta_filenames(data_directory):
meta_filenames = [
os.path.join(data_directory, meta_filename) for
meta_filename in os.listdir(data_directory) if
re.search(r'(\b|_)meta(\b|[_0-9])', meta_filename,
flags=re.IGNORECASE) and
not (meta_filename.startswith('.') or meta_filename.endswith('~'))]
return meta_filenames

def process_study_directory(jvm_args, study_directory, update_generic_assay_entity = None):
"""
Import an entire study directory based on meta files found.
Expand All @@ -241,12 +266,7 @@ def process_directory(jvm_args, study_directory, update_generic_assay_entity = N
cna_long_filepair = None

# Determine meta filenames in study directory
meta_filenames = (
os.path.join(study_directory, meta_filename) for
meta_filename in os.listdir(study_directory) if
re.search(r'(\b|_)meta(\b|[_0-9])', meta_filename,
flags=re.IGNORECASE) and
not (meta_filename.startswith('.') or meta_filename.endswith('~')))
meta_filenames = get_meta_filenames(study_directory)

# Read all meta files (excluding case lists) to determine what to import
for meta_filename in meta_filenames:
Expand Down Expand Up @@ -353,53 +373,53 @@ def process_directory(jvm_args, study_directory, update_generic_assay_entity = N
raise RuntimeError('No sample attribute file found')
else:
meta_filename, data_filename = sample_attr_filepair
import_study_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, study_meta_dictionary[meta_filename])
import_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, study_meta_dictionary[meta_filename])

# Next, we need to import resource definitions for resource data
if resource_definition_filepair is not None:
meta_filename, data_filename = resource_definition_filepair
import_study_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, study_meta_dictionary[meta_filename])
import_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, study_meta_dictionary[meta_filename])

# Next, we need to import sample definitions for resource data
if sample_resource_filepair is not None:
meta_filename, data_filename = sample_resource_filepair
import_study_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, study_meta_dictionary[meta_filename])
import_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, study_meta_dictionary[meta_filename])

# Next, import everything else except gene panel, structural variant data, GSVA and
# z-score expression. If in the future more types refer to each other, (like
# in a tree structure) this could be programmed in a recursive fashion.
for meta_filename, data_filename in regular_filepairs:
import_study_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, study_meta_dictionary[meta_filename])
import_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, study_meta_dictionary[meta_filename])

# Import structural variant data
if structural_variant_filepair is not None:
meta_filename, data_filename = structural_variant_filepair
import_study_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, study_meta_dictionary[meta_filename])
import_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, study_meta_dictionary[meta_filename])

# Import cna data
if cna_long_filepair is not None:
meta_filename, data_filename = cna_long_filepair
import_study_data(jvm_args=jvm_args, meta_filename=meta_filename, data_filename=data_filename,
meta_file_dictionary=study_meta_dictionary[meta_filename])
import_data(jvm_args=jvm_args, meta_filename=meta_filename, data_filename=data_filename,
meta_file_dictionary=study_meta_dictionary[meta_filename])

# Import expression z-score (after expression)
for meta_filename, data_filename in zscore_filepairs:
import_study_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, study_meta_dictionary[meta_filename])
import_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, study_meta_dictionary[meta_filename])

# Import GSVA genetic profiles (after expression and z-scores)
if gsva_score_filepair is not None:

# First import the GSVA score data
meta_filename, data_filename = gsva_score_filepair
import_study_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, study_meta_dictionary[meta_filename])
import_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, study_meta_dictionary[meta_filename])

# Second import the GSVA p-value data
meta_filename, data_filename = gsva_pvalue_filepair
import_study_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, study_meta_dictionary[meta_filename])
import_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, study_meta_dictionary[meta_filename])

if gene_panel_matrix_filepair is not None:
meta_filename, data_filename = gene_panel_matrix_filepair
import_study_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, study_meta_dictionary[meta_filename])
import_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, study_meta_dictionary[meta_filename])

# Import the case lists
case_list_dirname = os.path.join(study_directory, 'case_lists')
Expand All @@ -412,6 +432,72 @@ def process_directory(jvm_args, study_directory, update_generic_assay_entity = N
# enable study
update_study_status(jvm_args, study_id)

def get_meta_filenames_by_type(data_directory) -> Dict[str, Tuple[str, Dict]]:
"""
Read all meta files in the data directory and return meta information (filename, content) grouped by type.
"""
meta_file_type_to_meta_files = {}

# Determine meta filenames in study directory
meta_filenames = get_meta_filenames(data_directory)

# Read all meta files (excluding case lists) to determine what to import
for meta_filename in meta_filenames:

# Parse meta file
meta_dictionary = cbioportal_common.parse_metadata_file(
meta_filename, logger=LOGGER)

# Retrieve meta file type
meta_file_type = meta_dictionary['meta_file_type']
if meta_file_type is None:
# invalid meta file, let's die
raise RuntimeError('Invalid meta file: ' + meta_filename)
if meta_file_type not in meta_file_type_to_meta_files:
meta_file_type_to_meta_files[meta_file_type] = []

meta_file_type_to_meta_files[meta_file_type].append((meta_filename, meta_dictionary))
return meta_file_type_to_meta_files

def import_incremental_data(jvm_args, data_directory, update_generic_assay_entity, meta_file_type_to_meta_files):
"""
Load all data types that are available and support incremental upload
"""
for meta_file_type in INCREMENTAL_UPLOAD_SUPPORTED_META_TYPES:
if meta_file_type not in meta_file_type_to_meta_files:
continue
meta_pairs = meta_file_type_to_meta_files[meta_file_type]
for meta_pair in meta_pairs:
meta_filename, meta_dictionary = meta_pair
data_filename = os.path.join(data_directory, meta_dictionary['data_filename'])
import_data(jvm_args, meta_filename, data_filename, update_generic_assay_entity, meta_dictionary, incremental=True)

def update_case_lists_from_folder(jvm_args, data_directory, meta_file_type_to_meta_files):
"""
Updates case lists if clinical sample provided.
The command takes case_list/ folder as optional argument.
If folder exists case lists will be updated accordingly.
"""
if MetaFileTypes.SAMPLE_ATTRIBUTES in meta_file_type_to_meta_files:
case_list_dirname = os.path.join(data_directory, 'case_lists')
sample_attributes_metas = meta_file_type_to_meta_files[MetaFileTypes.SAMPLE_ATTRIBUTES]
for meta_pair in sample_attributes_metas:
meta_filename, meta_dictionary = meta_pair
LOGGER.info('Updating case lists with sample ids', extra={'filename_': meta_filename})
update_case_lists(jvm_args, meta_filename, case_lists_file_or_dir=case_list_dirname if os.path.isdir(case_list_dirname) else None)

def process_data_directory(jvm_args, data_directory, update_generic_assay_entity = None):
"""
Incremental import of data directory based on meta files found.
"""

meta_file_type_to_meta_files = get_meta_filenames_by_type(data_directory)

not_supported_meta_types = meta_file_type_to_meta_files.keys() - INCREMENTAL_UPLOAD_SUPPORTED_META_TYPES
if not_supported_meta_types:
raise NotImplementedError("These types do not support incremental upload: {}".format(", ".join(not_supported_meta_types)))
import_incremental_data(jvm_args, data_directory, update_generic_assay_entity, meta_file_type_to_meta_files)
update_case_lists_from_folder(jvm_args, data_directory, meta_file_type_to_meta_files)

def usage():
# TODO : replace this by usage string from interface()
Expand All @@ -435,26 +521,27 @@ def check_files(meta_filename, data_filename):
print('data-file cannot be found:' + data_filename, file=ERROR_FILE)
sys.exit(2)

def check_dir(study_directory):
def check_dir(data_directory):
# check existence of directory
if not os.path.exists(study_directory) and study_directory != '':
print('Study cannot be found: ' + study_directory, file=ERROR_FILE)
if not os.path.exists(data_directory) and data_directory != '':
print('Directory cannot be found: ' + data_directory, file=ERROR_FILE)
sys.exit(2)

def add_parser_args(parser):
parser.add_argument('-s', '--study_directory', type=str, required=False,
help='Path to Study Directory')
data_source_group = parser.add_mutually_exclusive_group()
data_source_group.add_argument('-s', '--study_directory', type=str, help='Path to Study Directory')
data_source_group.add_argument('-d', '--data_directory', type=str, help='Path to Data Directory')
parser.add_argument('-jvo', '--java_opts', type=str, default=os.environ.get('JAVA_OPTS'),
help='Path to specify JAVA_OPTS for the importer. \
(default: gets the JAVA_OPTS from the environment)')
(default: gets the JAVA_OPTS from the environment)')
parser.add_argument('-jar', '--jar_path', type=str, required=False,
help='Path to scripts JAR file')
help='Path to scripts JAR file')
parser.add_argument('-meta', '--meta_filename', type=str, required=False,
help='Path to meta file')
parser.add_argument('-data', '--data_filename', type=str, required=False,
help='Path to Data file')

def interface():
def interface(args=None):
parent_parser = argparse.ArgumentParser(description='cBioPortal meta Importer')
add_parser_args(parent_parser)
parser = argparse.ArgumentParser()
Expand Down Expand Up @@ -484,7 +571,7 @@ def interface():
# TODO - add same argument to metaimporter
# TODO - harmonize on - and _

parser = parser.parse_args()
parser = parser.parse_args(args)
if parser.command is not None and parser.subcommand is not None:
print('Cannot call multiple commands')
sys.exit(2)
Expand Down Expand Up @@ -547,14 +634,16 @@ def main(args):

# process the options
jvm_args = "-Dspring.profiles.active=dbcp " + args.java_opts
study_directory = args.study_directory

# check if DB version and application version are in sync
check_version(jvm_args)

if study_directory != None:
check_dir(study_directory)
process_directory(jvm_args, study_directory, args.update_generic_assay_entity)
if args.data_directory is not None:
check_dir(args.data_directory)
process_data_directory(jvm_args, args.data_directory, args.update_generic_assay_entity)
elif args.study_directory is not None:
check_dir(args.study_directory)
process_study_directory(jvm_args, args.study_directory, args.update_generic_assay_entity)
else:
check_args(args.command)
check_files(args.meta_filename, args.data_filename)
Expand Down
Loading

0 comments on commit e7cfb7b

Please sign in to comment.