Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove pipenv #52

Merged
merged 7 commits into from
Feb 14, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,5 @@ terraform.tfstate
terraform.tfstate.backup
.terraform.tfstate.lock.info
.terraform
.DS_Store
.DS_Store
.venv
31 changes: 0 additions & 31 deletions Pipfile

This file was deleted.

1 change: 1 addition & 0 deletions docs/assumed_knowledge.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ The workflows contained in this repository assume:
* You are familiar with image annotations and how they are used in image segmentation. If you are unfamiliar with this, see [here]() for more information. TODO: add link/link content
* You are familiar with how datasets are used in Machine Learning (for example, splitting your data into train, validation, and test). If you are unfamiliar with this, see [here]() for more information. TODO: add link/link content
* You are familiar with how use tmux on a remote machine and how we will use it to keep processes running even if the SSH window is closed or disconnected. If you are unfamiliar with this, see [here]() for more information. TODO: add link/link content
* The codebase is meant to be run on a virtual machine so it installs the python package user-wide. If you wish to run the code locally, we suggest using `virtualenv` (see [here](virtual_environment.md) for instructions).
2 changes: 1 addition & 1 deletion docs/data_ingestion.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Infrastructure that will be used:
1. When this completes, you should see your stack in `gs://<gcp_bucket_name>/raw-data/<zip_file>`.
1. Use Terraform to start the appropriate GCP virtual machine (`terraform apply` or `terraform apply -lock=false`).
1. Once Terraform finishes, you can check the GCP virtual machine console to ensure a virtual machine has been created named `<project_name>-<user_name>` where `<project_name>` is the name of your GCP project and `<user_name>` is your GCP user name.
1. SSH into the GCP virtual machine, start tmux (`tmux`), `cd` into the code directory (`cd necstlab-damage-segmentation`), and process a single zip file by running the command: `pipenv run python3 ingest_raw_data.py --gcp-bucket gs://<gcp_bucket_name> --zipped-stack gs://<gcp_bucket_name>/raw-data/<zip_file>`. Alternatively, to process an entire folder of zipped stacks, use `pipenv run python3 ingest_raw_data.py --gcp-bucket gs://<gcp_bucket_name>` (excluding the `--zipped-stack` argument), which will process all of the files in `gs://<gcp_bucket_name>/raw-data` (`ingest_raw_data.py` knows to process only `<gcp_bucket_name>/raw-data`).
1. SSH into the GCP virtual machine, start tmux (`tmux`), `cd` into the code directory (`cd necstlab-damage-segmentation`), and process a single zip file by running the command: `python3 ingest_raw_data.py --gcp-bucket gs://<gcp_bucket_name> --zipped-stack gs://<gcp_bucket_name>/raw-data/<zip_file>`. Alternatively, to process an entire folder of zipped stacks, use `python3 ingest_raw_data.py --gcp-bucket gs://<gcp_bucket_name>` (excluding the `--zipped-stack` argument), which will process all of the files in `gs://<gcp_bucket_name>/raw-data` (`ingest_raw_data.py` knows to process only `<gcp_bucket_name>/raw-data`).
1. When this completes, you should see your stack in `gs://<gcp_bucket_name>/processed-data/<stack_ID>`.
1. Use Terraform to terminate the appropriate GCP virtual machine (`terraform destroy`). Once Terraform finishes, you can check the GCP virtual machine console to ensure a virtual machine has been destroyed.

Expand Down
2 changes: 1 addition & 1 deletion docs/dataset_preparation.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,6 @@ Infrastructure that will be used:
1. Either edit the configuration file `configs/data_preparation.yaml` or create your own configuration file and place it in the `configs` folder.
1. Use Terraform to start the appropriate GCP virtual machine (`terraform apply`). This will copy the current code base from your local machine to the GCP machine so make sure any changes to the configuration file are saved before this step is run.
1. Once Terraform finishes, you can check the GCP virtual machine console to ensure a virtual machine has been created named `<project_name>-<user_name>` where `<project_name>` is the name of your GCP project and `<user_name>` is your GCP user name.
1. To create a dataset, SSH into the virtual machine `<project_name>-<user_name>`, start tmux (`tmux`), `cd` into the code directory (`cd necstlab-damage-segmentation`), and run `pipenv run python3 prepare_dataset.py --gcp-bucket <gcp_bucket> --config-file configs/<config_filename>.yaml`.
1. To create a dataset, SSH into the virtual machine `<project_name>-<user_name>`, start tmux (`tmux`), `cd` into the code directory (`cd necstlab-damage-segmentation`), and run `python3 prepare_dataset.py --gcp-bucket <gcp_bucket> --config-file configs/<config_filename>.yaml`.
1. Once dataset preparation has finished, you should see the folder `<gcp_bucket>/datasets/<dataset_ID>` has been created and populated, where `<dataset_ID>` was defined in `configs/data_preparation.yaml`.
1. Use Terraform to terminate the appropriate GCP virtual machine (`terraform destroy`). Once Terraform finishes, you can check the GCP virtual machine console to ensure a virtual machine has been destroyed.
2 changes: 1 addition & 1 deletion docs/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Infrastructure that will be used:
1. If the unsegmented stacks are not in a GCP bucket, see the previous workflow `Copying the raw data into the cloud for storage and usage`.
1. Use Terraform to start the appropriate GCP virtual machine (`terraform apply`).
1. Once Terraform finishes, you can check the GCP virtual machine console to ensure a virtual machine has been created named `<project_name>-<user_name>` where `<project_name>` is the name of your GCP project and `<user_name>` is your GCP user name.
1. To infer (segment) the damage of the stacks, SSH into the virtual machine `<project_name>-<user_name>`, start tmux (`tmux`), `cd` into the code directory (`cd necstlab-damage-segmentation`), and run `pipenv run python3 infer_segmentation.py --gcp-bucket <gcp_bucket> --stack-id <stack_id> --model-id <model_id>`.
1. To infer (segment) the damage of the stacks, SSH into the virtual machine `<project_name>-<user_name>`, start tmux (`tmux`), `cd` into the code directory (`cd necstlab-damage-segmentation`), and run `python3 infer_segmentation.py --gcp-bucket <gcp_bucket> --stack-id <stack_id> --model-id <model_id>`.
1. Once inference has finished, you should see the folder `<gcp_bucket>/inferences/<inference_ID>` has been created and populated, where `<inference_ID>` is `<stack_id>_<model_id>`.
1. Use Terraform to terminate the appropriate GCP virtual machine (`terraform destroy`). Once Terraform finishes, you can check the GCP virtual machine console to ensure a virtual machine has been destroyed.

Expand Down
2 changes: 1 addition & 1 deletion docs/testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,6 @@ Infrastructure that will be used:
1. If the stacks are not in a GCP bucket, see the previous workflow `Copying the raw data into the cloud for storage and usage`.
1. Use Terraform to start the appropriate GCP virtual machine (`terraform apply`).
1. Once Terraform finishes, you can check the GCP virtual machine console to ensure a virtual machine has been created named `<project_name>-<user_name>` where `<project_name>` is the name of your GCP project and `<user_name>` is your GCP user name.
1. To create a dataset, SSH into the virtual machine `<project_name>-<user_name>`, start tmux (`tmux`), `cd` into the code directory (`cd necstlab-damage-segmentation`), and run `pipenv run python3 test_segmentation_model.py --gcp-bucket <gcp_bucket> --dataset-id <dataset_id> --model-id <model_id>`.
1. To create a dataset, SSH into the virtual machine `<project_name>-<user_name>`, start tmux (`tmux`), `cd` into the code directory (`cd necstlab-damage-segmentation`), and run `python3 test_segmentation_model.py --gcp-bucket <gcp_bucket> --dataset-id <dataset_id> --model-id <model_id>`.
1. Once dataset preparation has finished, you should see the folder `<gcp_bucket>/tests/<test_ID>` has been created and populated, where `<test_ID>` is `<dataset_id>_<model_id>`.
1. Use Terraform to terminate the appropriate GCP virtual machine (`terraform destroy`). Once Terraform finishes, you can check the GCP virtual machine console to ensure a virtual machine has been destroyed.
2 changes: 1 addition & 1 deletion docs/training.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Infrastructure that will be used:
1. Either edit the configuration file `configs/train_config.yaml` or create your own configuration file and place it in the `configs` folder.
1. Use Terraform to start the appropriate GCP virtual machine (`terraform apply`). This will copy the current code base from your local machine to the GCP machine so make sure any changes to the configuration file are saved before this step is run.
1. Once Terraform finishes, you can check the GCP virtual machine console to ensure a virtual machine has been created named `<project_name>-<user_name>` where `<project_name>` is the name of your GCP project and `<user_name>` is your GCP user name.
1. To create a dataset, SSH into the virtual machine `<project_name>-<user_name>`, start tmux (`tmux`), `cd` into the code directory (`cd necstlab-damage-segmentation`), and run `pipenv run python3 train_segmentation_model.py --gcp-bucket <gcp_bucket> --config-file configs/<config_filename>.yaml`.
1. To create a dataset, SSH into the virtual machine `<project_name>-<user_name>`, start tmux (`tmux`), `cd` into the code directory (`cd necstlab-damage-segmentation`), and run `python3 train_segmentation_model.py --gcp-bucket <gcp_bucket> --config-file configs/<config_filename>.yaml`.
1. Once dataset preparation has finished, you should see the folder `<gcp_bucket>/models/<model_ID>-<timestamp>` has been created and populated, where `<model_ID>` was defined in `configs/train_config.yaml`.
1. Use Terraform to terminate the appropriate GCP virtual machine (`terraform destroy`). Once Terraform finishes, you can check the GCP virtual machine console to ensure a virtual machine has been destroyed.

Expand Down
10 changes: 10 additions & 0 deletions docs/virtual_environment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
To set up a virtual environment:
- Install it: `pip install virtualenv`
- Create the virtual environment: `virtualenv --always-copy --system-site-packages --python=python3 .venv`
- Install the needed packages: `.venv/bin/pip install -q -r requirements.txt`

To use the virtual environment, enter it: `source .venv/bin/activate`

To exit the virtual environment use: `deactivate`

To delete the virtual environment just delete the `.venv` folder: `rm -r .venv`
5 changes: 4 additions & 1 deletion gcp.tf
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,10 @@ resource "google_compute_instance" "vm" {
}

provisioner "remote-exec" {
script = "./scripts/resource-creation.sh"
inline = [
"echo 'Running resource creation script... (this may take 10+ minutes)'",
"bash ~/${var.repository_name}/scripts/resource-creation.sh > resource-creation.log"
]
connection {
user = "${var.username}"
type = "ssh"
Expand Down
20 changes: 20 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
numpy
tensorflow-gpu
opencv-python
scikit-image
sklearn
progress
Keras
ipython
segmentation-models
pytz
tensorboard
pillow
pandas
google-cloud-storage
pyyaml
jupyter
crcmod
gitpython
matplotlib
ipykernel
12 changes: 6 additions & 6 deletions scripts/resource-creation.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb
sudo dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb


# install needed packages
sudo apt-get install -y cmake \
git \
Expand All @@ -34,8 +33,9 @@ sudo apt-get install -y cmake \
tree \
p7zip-full

sudo pip3 uninstall crcmod
sudo pip3 install pipenv
sudo pip3 install --no-cache-dir -U crcmod

cd necstlab-damage-segmentation && pipenv install
pip3 install --upgrade pip
pip3 install --upgrade setuptools
pip3 uninstall crcmod -y
pip3 install --no-cache-dir crcmod
pip3 install --upgrade pyasn1
cd necstlab-damage-segmentation && pip3 install -r requirements.txt
12 changes: 6 additions & 6 deletions scripts/run_all_large.sh
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
#!/bin/bash


pipenv run python3 ingest_raw_data.py --gcp-bucket gs://necstlab-sandbox
pipenv run python3 prepare_dataset.py --gcp-bucket gs://necstlab-sandbox --config-file configs/dataset-large.yaml
pipenv run python3 train_segmentation_model.py --gcp-bucket gs://necstlab-sandbox --config-file configs/train-large.yaml
pipenv run python3 test_segmentation_model.py --gcp-bucket gs://necstlab-sandbox --dataset-id dataset-large --model-id segmentation-model-large_20190924T180419Z
pipenv run python3 infer_segmentation.py --gcp-bucket gs://necstlab-sandbox --model-id segmentation-model-large_20190924T180419Z --stack-id THIN_REF_S2_P1_L3_2496_1563_2159
pipenv run python3 infer_segmentation.py --gcp-bucket gs://necstlab-sandbox --model-id segmentation-model-large_20190924T180419Z --stack-id 8bit_AS4_S2_P1_L6_2560_1750_2160
python3 ingest_raw_data.py --gcp-bucket gs://necstlab-sandbox
python3 prepare_dataset.py --gcp-bucket gs://necstlab-sandbox --config-file configs/dataset-large.yaml
python3 train_segmentation_model.py --gcp-bucket gs://necstlab-sandbox --config-file configs/train-large.yaml
python3 test_segmentation_model.py --gcp-bucket gs://necstlab-sandbox --dataset-id dataset-large --model-id segmentation-model-large_20190924T180419Z
python3 infer_segmentation.py --gcp-bucket gs://necstlab-sandbox --model-id segmentation-model-large_20190924T180419Z --stack-id THIN_REF_S2_P1_L3_2496_1563_2159
python3 infer_segmentation.py --gcp-bucket gs://necstlab-sandbox --model-id segmentation-model-large_20190924T180419Z --stack-id 8bit_AS4_S2_P1_L6_2560_1750_2160
12 changes: 6 additions & 6 deletions scripts/run_all_small.sh
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
#!/bin/bash


pipenv run python3 ingest_raw_data.py --gcp-bucket gs://necstlab-sandbox
pipenv run python3 prepare_dataset.py --gcp-bucket gs://necstlab-sandbox --config-file configs/dataset-small.yaml
pipenv run python3 train_segmentation_model.py --gcp-bucket gs://necstlab-sandbox --config-file configs/train-small.yaml
pipenv run python3 test_segmentation_model.py --gcp-bucket gs://necstlab-sandbox --dataset-id dataset-small --model-id segmentation-model-small_20190924T191717Z
pipenv run python3 infer_segmentation.py --gcp-bucket gs://necstlab-sandbox --model-id segmentation-model-small_20190924T191717Z --stack-id THIN_REF_S2_P1_L3_2496_1563_2159
pipenv run python3 infer_segmentation.py --gcp-bucket gs://necstlab-sandbox --model-id segmentation-model-small_20190924T191717Z --stack-id 8bit_AS4_S2_P1_L6_2560_1750_2160
python3 ingest_raw_data.py --gcp-bucket gs://necstlab-sandbox
python3 prepare_dataset.py --gcp-bucket gs://necstlab-sandbox --config-file configs/dataset-small.yaml
python3 train_segmentation_model.py --gcp-bucket gs://necstlab-sandbox --config-file configs/train-small.yaml
python3 test_segmentation_model.py --gcp-bucket gs://necstlab-sandbox --dataset-id dataset-small --model-id segmentation-model-small_20190924T191717Z
python3 infer_segmentation.py --gcp-bucket gs://necstlab-sandbox --model-id segmentation-model-small_20190924T191717Z --stack-id THIN_REF_S2_P1_L3_2496_1563_2159
python3 infer_segmentation.py --gcp-bucket gs://necstlab-sandbox --model-id segmentation-model-small_20190924T191717Z --stack-id 8bit_AS4_S2_P1_L6_2560_1750_2160