Skip to content

Commit

Permalink
Deprecate Google Cloud configuration JSONs (#711)
Browse files Browse the repository at this point in the history
  • Loading branch information
kjaisingh authored Sep 19, 2024
1 parent d66f760 commit a436703
Show file tree
Hide file tree
Showing 8 changed files with 19 additions and 76 deletions.
5 changes: 1 addition & 4 deletions .github/workflows/testwdls.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,7 @@ jobs:
# Setup for running womtool
pip install jinja2==3.1.2
wget -O womtool.jar https://github.com/broadinstitute/cromwell/releases/download/84/womtool-84.jar
echo \
'{ "google_project_id": "my-google-project-id", "terra_billing_project_id": "my-terra-billing-project" }' \
> inputs/values/google_cloud.my_project.json
scripts/inputs/build_default_inputs.sh -d . -c google_cloud.my_project
scripts/inputs/build_default_inputs.sh -d .
- name: Test with Miniwdl
run: |
Expand Down
17 changes: 3 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,14 +125,6 @@ Example workflow inputs can be found in `/inputs`. Build using `scripts/inputs/b
generates input jsons in `/inputs/build`. Except the MELT docker image, all required resources are available in public
Google buckets.

Some workflows require a Google Cloud Project ID to be defined in a cloud environment parameter group. Workspace builds
require a Terra billing project ID as well. An example is provided at `/inputs/values/google_cloud.json` but should
not be used, as modifying this file will cause tracked changes in the repository. Instead, create a copy in the same
directory with the format `google_cloud.my_project.json` and modify as necessary.

Note that these inputs are required only when certain data are located in requester pays buckets. If this does not
apply, users may use placeholder values for the cloud configuration and simply delete the inputs manually.

#### MELT
**Important**: The example input files contain MELT inputs that are NOT public (see [Requirements](#requirements)). These include:

Expand All @@ -150,8 +142,7 @@ We recommend running the pipeline on a dedicated [Cromwell](https://github.com/b
> cp $GATK_SV_ROOT/wdl/*.wdl .
> zip dep.zip *.wdl
> cd ..
> echo '{ "google_project_id": "my-google-project-id", "terra_billing_project_id": "my-terra-billing-project" }' > inputs/values/google_cloud.my_project.json
> bash scripts/inputs/build_default_inputs.sh -d $GATK_SV_ROOT -c google_cloud.my_project
> bash scripts/inputs/build_default_inputs.sh -d $GATK_SV_ROOT
> cp $GATK_SV_ROOT/inputs/build/ref_panel_1kg/test/GATKSVPipelineBatch/GATKSVPipelineBatch.json GATKSVPipelineBatch.my_run.json
> cromshell submit wdl/GATKSVPipelineBatch.wdl GATKSVPipelineBatch.my_run.json cromwell_config.json wdl/dep.zip
```
Expand Down Expand Up @@ -231,14 +222,12 @@ Here is an example of how to generate workflow input jsons from `GATKSVPipelineB
--final-workflow-outputs-dir gs://my-outputs-bucket \
metadata.json \
> inputs/values/my_ref_panel.json
> # Define your google project id (for Cromwell inputs) and Terra billing project (for workspace inputs)
> echo '{ "google_project_id": "my-google-project-id", "terra_billing_project_id": "my-terra-billing-project" }' > inputs/values/google_cloud.my_project.json
> # Build test files for batched workflows (google cloud project id required)
> # Build test files for batched workflows
> python scripts/inputs/build_inputs.py \
inputs/values \
inputs/templates/test \
inputs/build/my_ref_panel/test \
-a '{ "test_batch" : "ref_panel_1kg", "cloud_env": "google_cloud.my_project" }'
-a '{ "test_batch" : "ref_panel_1kg" }'
> # Build test files for the single-sample workflow
> python scripts/inputs/build_inputs.py \
inputs/values \
Expand Down
3 changes: 0 additions & 3 deletions inputs/values/google_cloud.json

This file was deleted.

11 changes: 0 additions & 11 deletions scripts/cromwell/launch_wdl.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,6 @@ done
WDL_FILENAME=$(basename "$WDL")
WDL_NAME=${WDL_FILENAME%.*}

CLOUD_ENV="$GATK_SV_ROOT/inputs/values/google_cloud.my_project.json"
echo "CLOUD_ENV=$CLOUD_ENV"
cat << EOF > "$CLOUD_ENV"
{
"google_project_id": "broad-dsde-methods",
"terra_billing_project_id": "broad-dsde-methods"
}
EOF


RUN_DIR="$GATK_SV_ROOT/runs/$WDL_NAME"
DEPS_ZIP="$RUN_DIR/deps.zip"
Expand All @@ -34,10 +25,8 @@ zip "$DEPS_ZIP" *.wdl &> /dev/null
cd "$GATK_SV_ROOT"
"$GATK_SV_ROOT/scripts/inputs/build_default_inputs.sh" \
-d "$GATK_SV_ROOT" \
-c google_cloud.my_project \
> /dev/null

rm -f $CLOUD_ENV

echo "Available input jsons:"
printf "%d\t%s\n" 0 "none (skip cromwell submit)"
Expand Down
21 changes: 6 additions & 15 deletions scripts/inputs/build_default_inputs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,8 @@

function usage() {
printf "Usage: \n \
%s -d <REPO_BASE_DIR> -c <CLOUD_ENV> \n \
<REPO_BASE_DIR> \t path to gatk-sv base directory \n \
<CLOUD_ENV> \t name of cloud environment json (e.g. 'google_cloud.my' for inputs/values/google_cloud.my.json)" "$1"
%s -d <REPO_BASE_DIR> \n \
<REPO_BASE_DIR> \t path to gatk-sv base directory" "$1"
}

if [[ "$#" == 0 ]]; then
Expand All @@ -14,10 +13,9 @@ fi
#################################################
# Parsing arguments
#################################################
while getopts "d:c:" option; do
while getopts "d:" option; do
case "$option" in
d) BASE_DIR="$OPTARG" ;;
c) CLOUD_ENV="$OPTARG" ;;
*) usage "$0" && exit 1 ;;
esac
done
Expand All @@ -28,12 +26,6 @@ if [ -z "$BASE_DIR" ] ; then
exit 1
fi

if [ -z "$CLOUD_ENV" ] ; then
echo "xy"
usage "$0"
exit 1
fi

if [[ ! -d "$BASE_DIR" ]]; then
echo "Invalid directory: $BASE_DIR"
exit 1
Expand All @@ -45,17 +37,16 @@ bash scripts/inputs/clean_default_inputs.sh -d ${BASE_DIR}

echo "########## Building ref_panel_1kg test ##########"
scripts/inputs/build_inputs.py ${BASE_DIR}/inputs/values ${BASE_DIR}/inputs/templates/test ${BASE_DIR}/inputs/build/ref_panel_1kg/test \
-a '{ "test_batch" : "ref_panel_1kg", "cloud_env" : "'$CLOUD_ENV'" }'
-a '{ "test_batch" : "ref_panel_1kg" }'

echo "########## Building ref_panel_1kg cohort Terra workspace ##########"
scripts/inputs/build_inputs.py ${BASE_DIR}/inputs/values ${BASE_DIR}/inputs/templates/terra_workspaces/cohort_mode ${BASE_DIR}/inputs/build/ref_panel_1kg/terra \
-a '{ "test_batch" : "ref_panel_1kg", "cloud_env" : "'$CLOUD_ENV'" }'
-a '{ "test_batch" : "ref_panel_1kg" }'

echo "########## Building hgdp test ##########"
scripts/inputs/build_inputs.py ${BASE_DIR}/inputs/values ${BASE_DIR}/inputs/templates/test ${BASE_DIR}/inputs/build/hgdp/test \
-a '{ "test_batch" : "hgdp", "cloud_env" : "'$CLOUD_ENV'" }'
-a '{ "test_batch" : "hgdp" }'

# Note CLOUD_ENV is not currently required for the single-sample workflow
echo "########## Building NA19240 single-sample test ##########"
scripts/inputs/build_inputs.py ${BASE_DIR}/inputs/values ${BASE_DIR}/inputs/templates/test/GATKSVPipelineSingleSample ${BASE_DIR}/inputs/build/NA19240/test \
-a '{ "single_sample" : "test_single_sample_NA19240", "ref_panel" : "ref_panel_1kg" }'
Expand Down
5 changes: 1 addition & 4 deletions scripts/inputs/build_inputs.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,15 +117,12 @@ def main():
raw_input_bundles['test_batch_empty']['name'] = 'test_batch'
raw_input_bundles['single_sample_none'] = {}
raw_input_bundles['single_sample_none']['name'] = 'single_sample'
raw_input_bundles['cloud_env_none'] = {}
raw_input_bundles['cloud_env_none']['name'] = 'cloud_env'

default_aliases = {'dockers': 'dockers',
'ref_panel': 'ref_panel_empty',
'reference_resources': 'resources_hg38',
'test_batch': 'test_batch_empty',
'single_sample': 'single_sample_none',
'cloud_env': 'cloud_env_none'}
'single_sample': 'single_sample_none'}

# prepare the input_dict using default, document default, and user-specified aliases
input_dict = {}
Expand Down
30 changes: 7 additions & 23 deletions website/docs/advanced/build_inputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,20 +21,10 @@ You may run the following commands to get these example inputs.
git clone https://github.com/broadinstitute/gatk-sv && cd gatk-sv
```

2. Create a JSON file containing the Terra billing project (for use on Terra)
or the Google project ID (for use on Cromwell) that you will use to run
the workflows with the test input. You may create this file by running
the following command and replacing `"my-google-project-id"` and
`"my-terra-billing-project"` with your project and billing IDs.
2. Create test inputs.

```shell
echo '{ "google_project_id": "my-google-project-id", "terra_billing_project_id": "my-terra-billing-project" }' > inputs/values/google_cloud.my_project.json
```

3. Create test inputs.

```shell
bash scripts/inputs/build_default_inputs.sh -d . -c google_cloud.my_project
bash scripts/inputs/build_default_inputs.sh -d .
```

Running this command generates test inputs in `gatk-sv/inputs/build` with the following structure.
Expand Down Expand Up @@ -62,7 +52,7 @@ python scripts/inputs/build_inputs.py \
inputs/values \
inputs/templates/test/GATKSVPipelineSingleSample \
inputs/build/NA19240/test \
-a '{ "test_batch" : "ref_panel_1kg", "cloud_env": "google_cloud.my_project" }'
-a '{ "test_batch" : "ref_panel_1kg" }'
```


Expand Down Expand Up @@ -98,24 +88,18 @@ Here is an example of how to generate workflow input jsons from `GATKSVPipelineB
metadata.json \
> inputs/values/my_ref_panel.json
```

3. Define your google project id (for Cromwell inputs) and Terra billing project (for workspace inputs).

```shell
echo '{ "google_project_id": "my-google-project-id", "terra_billing_project_id": "my-terra-billing-project" }' > inputs/values/google_cloud.my_project.json
```

4. Build test files for batched workflows (google cloud project id required).
3. Build test files for batched workflows (google cloud project id required).

```shell
python scripts/inputs/build_inputs.py \
inputs/values \
inputs/templates/test \
inputs/build/my_ref_panel/test \
-a '{ "test_batch" : "ref_panel_1kg", "cloud_env": "google_cloud.my_project" }'
-a '{ "test_batch" : "ref_panel_1kg" }'
```

5. Build test files for the single-sample workflow
4. Build test files for the single-sample workflow

```shell
python scripts/inputs/build_inputs.py \
Expand All @@ -125,7 +109,7 @@ Here is an example of how to generate workflow input jsons from `GATKSVPipelineB
-a '{ "single_sample" : "test_single_sample_NA19240", "ref_panel" : "my_ref_panel" }'
```

6. Build files for a Terra workspace.
5. Build files for a Terra workspace.

```shell
python scripts/inputs/build_inputs.py \
Expand Down
3 changes: 1 addition & 2 deletions website/docs/gs/quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,7 @@ The input values are provided only as an example and are not publicly accessible
> cp $GATK_SV_ROOT/wdl/*.wdl .
> zip dep.zip *.wdl
> cd ..
> echo '{ "google_project_id": "my-google-project-id", "terra_billing_project_id": "my-terra-billing-project" }' > inputs/values/google_cloud.my_project.json
> bash scripts/inputs/build_default_inputs.sh -d $GATK_SV_ROOT -c google_cloud.my_project
> bash scripts/inputs/build_default_inputs.sh -d $GATK_SV_ROOT
> cp $GATK_SV_ROOT/inputs/build/ref_panel_1kg/test/GATKSVPipelineBatch/GATKSVPipelineBatch.json GATKSVPipelineBatch.my_run.json
> cromshell submit wdl/GATKSVPipelineBatch.wdl GATKSVPipelineBatch.my_run.json cromwell_config.json wdl/dep.zip
```
Expand Down

0 comments on commit a436703

Please sign in to comment.