Testnet Node Deployment (testnet.polykey.io)

### Specification

It is now time for our second attempt at testnet deployment.

![image](https://user-images.githubusercontent.com/640797/138802981-b30ccd55-3ede-4cbb-9e2b-fe644695d711.png)

We had previously already done a PK deployment on AWS using ECS back when PK was `0.0.41`. While the AWS deployment worked, we hit a lot of problems which meant we had to go through an 8 month long refactoring process over the entire codebase.

Now that the codebase is finally refactored, we're ready for the second attempt.

The AWS architecture is basically the same as before, but our configuration should be a lot more simpler. There are some changes though.

* Before we had to deal with Node root certificates, now root certificates are no longer relevant to the testnet/mainnet deployment
* We are now separating into 2 clusters of PK seed nodes: `mainnet.polykey.io` and `testnet.polykey.io`. The `mainnet` is intended for production use, and we will first prototype our testnet deployment and testnet will be where new versions of PK are tested before being released on production.
* Both mainnet and testnet seed nodes will be trusted by default, but the PK releases should default to use the mainnet and have a switch to use the testnet.
* We don't know yet whether we should be using NLB or not, we may decide not to use a NLB at all. But there shouldn't be any sort of session state that is required for P2P functionality
* NLBs cannot be used with PK clients that are debugging the testnet/mainnet nodes, because they would resolve any possible node, and in this case there is in fact network session state. Instead PK client debugging has to be done with the container IPs.
* We know that IPv6 isn't supported yet so we will have IPv4 and DNS support.
* We should be using well known ports here of `1314` UDP and `1315` TCP for the ingress port and the client port respectively.
* The PK nodes are not stateless, they do require node state. However this node state is not important to us to persist. So any EBS volume mounted into the ECS container should work. Basically we just need a mutable temporary directory. What kind of mutations are there? Well the kademlia node graph is persisted atm and is not in-memory.

### Additional context

* https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/issues/197 - old issue detailing how to configure the AWS infrastructure (we're doing it manually right now) **refer to this when starting on this issue**
* https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/issues/237 - old issue regarding old task definition envrionment variables

### Tasks

1. [x] - Upload image to ECR "elastic container registry"
   * check https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/issues/197#note_496723119
   * make sure you have the right authentication details
2. [x] - Create ECS "elastic container service" Task Definition for the new image uploaded to ECR
   * the Task Definition describes how to execute the container
   * just like how we executed in `docker run`, we will need the same parameters
   * the PK agent will need a writable directory to be the node state, if we don't specify anything this will just be a temporary scratch layer by ECS, so we should be using a volume mount of some sort, the data inside this PK agent is not important, therefore any AWS volume should be ok, however an NFS/EFS volume might help us in case we want to debug things
   * additional environment variables for unattended bootstrapping and port/host binding
     - MatrixAI/Polykey#202 - bootstrapping
     - MatrixAI/Polykey#286 - ports
3. [x] - Start the ECS service, just cluster of 1, test that it is working by using the PK CLI and directly contacting the ECS IP address and port for `PK_PORT`.
4. [x] - Integrate firewall (security group), NLB and elastic IP to the NLB and then attach the `testnet.polykey.io` domain to the NLB
   * NLB won't maintain a session between connections in order to point to the same agent, ideally we won't need to have a common for NAT-busting purposes, this shouldn't be a problem for NAT-busting for either hole-punching relay or actual relay, as far I know
   * if it is a problem, an alternative to NLB, is domain-level load balancing, where multiple EIPs are presented by `testnet.polykey.io` and they are randomised, cloudflare supports this https://www.cloudflare.com/en-au/learning/performance/what-is-dns-load-balancing/ resolve this in MatrixAI/Polykey#177 
~5. [ ] - Update reference documentation with testnet architecture including AWS, use a component diagram with relevant AWS resources~ - waiting on Pulumi... intermediate diagram available https://github.com/MatrixAI/js-polykey/issues/194#issuecomment-1179923272

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Testnet Node Deployment (testnet.polykey.io) #194

Specification

Additional context

Tasks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Testnet Node Deployment (testnet.polykey.io) #194

Description

Specification

Additional context

Tasks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions