-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ETL-153] Changes necessary for exporting study data in prod #65
Conversation
templates/ec2-bootstrap-trigger.yaml
Outdated
Type: AWS::EC2::SecurityGroup | ||
Properties: | ||
GroupDescription: !Sub "The security group for ${AWS::StackName}" | ||
SecurityGroupIngress: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this security group? also there probably needs to be tighter control on this instance. We should probably put it behind our VPN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding VpcId - something like this: https://github.com/nlpsandbox/nlpsandbox-infra/blob/main/sceptre/nlpsandbox/templates/nlpsandbox.yaml#L188..
Do we have a VPC set up? If there is no VPC set up, we have to do something like this: https://github.com/Sage-Bionetworks-Challenges/cnb-aws-infra/pull/1/files
That would be another JIRA ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made another commit to add the security group to the VPC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasyu888 Does the EC2 need to be in a VPC because of the ingress rules? I should be able to remove this rule so that it's not possible to connect to the instance. Everything is taken care of by the EC2 user data. The ssh access is just for debugging purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good, but there is a security concern about the ec2. We need to put it behind our VPN.
SsmParameterName: synapse-bridgedownstream-auth | ||
PrivateKey: ec2-bootstrap-trigger | ||
CrontabURI: s3://{{ stack_group_config.artifact_bucket_name }}/BridgeDownstream/{{ stack_group_config.latest_version }}/ec2/resources/crontab | ||
VPCId: vpc-d6c2a9ab |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this vpc-d6c2a9ab
? Did you create it yourself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can view the VPC in the console. I didn't create it myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at this, and the VPC is just the default VPC created and it doesn't meet our security standards. This resource needs to sit behind the Sage VPN: https://sagebionetworks.jira.com/browse/ETL-176
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to submitting files on Synapse to a workflow, this script can now diff those Synapse files with a parquet dataset on S3 before submitting to a workflow. The script has been renamed to reflect this additional functionality.
Add ec2 bootstrap trigger stack's security group to VPC Add ec2 bootstrap trigger stack to prod Remove branch reference in bootstrap trigger dockerfile
FROM python:3.9.10 | ||
|
||
RUN pip install boto3==1.16.33 pandas==1.1.5 synapseclient==2.5.1 pyarrow==4.0.0 | ||
RUN git clone -b https://github.com/Sage-Bionetworks/BridgeDownstream.git /root/BridgeDownstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure the docker build doesn't build from an existing cache or new changes to the repo won't get pulled in.
Change summary:
microphone_v1
table. (Same structure asmicrophone_levels_v1
, the developers decided to rename this file for some reason).src/ec2/resources/
JsonToParquetTriggerSchedule
parameter to study-pipeline template. Omitting this parameter creates an on-demand json to parquet trigger. Passing a cron expression to this parameter creates a json to parquet trigger which runs on a schedule. 6. t4tg. 6edrfgvb