Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide Persistent Volumes for brokered services in k8s #3127

Closed
1 task
adborden opened this issue Apr 22, 2021 · 14 comments
Closed
1 task

Provide Persistent Volumes for brokered services in k8s #3127

adborden opened this issue Apr 22, 2021 · 14 comments

Comments

@adborden
Copy link
Contributor

adborden commented Apr 22, 2021

User Story

In order to support durable storage for k8s workloads, the EKS brokerpak should configure persistent storage to satisfy Persistent Volume Claims.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

  • GIVEN [a contextual precondition]
    [AND optionally another precondition]
    WHEN [a triggering event] happens
    THEN [a verifiable outcome]
    [AND optionally another verifiable outcome]

Background

We're currently using ephemeral storage for SolrCloud, which is limited to 20GB and isn't going to get us to production for Catalog.

$ sudo du -sh /data/solr5/data/{catalog,inventory}-next
23G     /data/solr5/data/catalog-next
86M     /data/solr5/data/inventory-next

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]

Sketch

[Notes or a checklist reflecting our understanding of the selected approach]
Summary of this article:

  • Create EFS filesystem
  • Create EFS access point
  • Create ingress security group
  • Create EFS mount targets in each subnet used by the Fargate profile
  • Create PVC (use kubectl to start)
@mogul mogul changed the title Use Persistent Volumes for brokered SolrCloud instances Provide Persistent Volumes for brokered services in k8s Apr 22, 2021
@mogul
Copy link
Contributor

mogul commented Apr 22, 2021

I do want to get PVCs supported in the brokered EKS, but...

We're currently using ephemeral storage for SolrCloud, which is limited to 20GB and isn't going to get us to production for Catalog.

I'm not sure that's the case. With ephemeral storage there's a 20GB limit per SolrCloud node, but by scaling horizontally we can increase the capacity... Not every shard has to be replicated to every node!

@mogul
Copy link
Contributor

mogul commented Jun 2, 2021

Still keeping an eye on this... It looks like it's now possible to dynamically configure EFS volumes in response to PVCs without any manual intervention up front. There may still be some restrictions; the GitHub issue about this capability is not especially clear: The post says they're still working on EKS, but then unceremoniously closes the issue, and the blog post doesn't mention the EKS restriction. 🤷

@mogul
Copy link
Contributor

mogul commented Jun 2, 2021

Oh, here's where the confusion is: You can't (yet) run the EFS CSI controller on Fargate. The referenced capability is being discussed here.
image

@mogul
Copy link
Contributor

mogul commented Jun 2, 2021

@mogul
Copy link
Contributor

mogul commented Sep 7, 2021

@mogul
Copy link
Contributor

mogul commented Dec 16, 2021

@mogul
Copy link
Contributor

mogul commented Dec 17, 2021

@mogul
Copy link
Contributor

mogul commented Dec 22, 2021

Next steps...

Enable PVCs in the eks-brokerpak

Enable use of PVCs in the solr-brokerpak

@nickumia-reisys
Copy link
Contributor

nickumia-reisys commented Jan 8, 2022

A PR has been started to incorporate this capability. There were concerns with Fargate compatibility and IAM role permissions. A glimpse of it working has been witnessed with managed node groups provisioned along with choosing the following option,
image

This work has been postponed due to a workaround with shard implementations for Solr.

The next steps for this issue,

@nickumia-reisys nickumia-reisys removed their assignment Jan 8, 2022
@nickumia-reisys
Copy link
Contributor

It turns out that we need this because of apache/solr-operator#365

@nickumia-reisys
Copy link
Contributor

Current Status:

  • EBS has easy integration with EKS through the EKS EBS addon

  • EBS only works with Manged nodes, not Fargate.

  • EBS connects individually to each node and are not shared between nodes.

    • We can't used managed nodes for everything because that would defeat the purpose of being in Fargate.
  • EFS is a shared filesystem across all nodes in a cluster.

  • EFS does not have an addon, it must be installed via a Helm Chart

  • EFS does not support dynamic provisioning, only static provisioning in Fargate.
    image

  • The example creates an EFS in the main cluster which means that the EFS would need to be provisioned in the EKS brokerpak.

  • There are two options:

    1. Follow example, create static EFS in EKS brokerpak.
    • Design considerations:
      • Each eks cluster would have shared storage across all nodes, so if two Solrs are brokered in the staging space, their collection memory would overlap if they were named the same thing.
      • The total size of desired memory would need to be known prior to creating the EKS cluster.
    1. Fetch EKS config from the Solr brokerpak and do the provisioning alongside the creation of solr.
    • Design considerations:
      • I don't know if we can manipulate the EKS cluster from the Solr brokerpak.
      • I prefer this method.

Resources needed alongside the EFS volume:

  • Data sources:
    • VPC id (EKS-specific)
    • Cluster CIDR range (EKS-specific)
  • Resources:
    • Security groups and additional rules (EKS-specific)
    • EFS driver (EKS-specific)
    • EFS volume
    • Mount volume into nodes

@nickumia-reisys
Copy link
Contributor

List of Related Resources:

@nickumia-reisys
Copy link
Contributor

Tentative Final Design

@mogul
Copy link
Contributor

mogul commented Jan 27, 2022

Another key reference: The k8s documentation on persistent volumes.

@mogul mogul self-assigned this Jan 27, 2022
@mogul mogul removed their assignment Feb 3, 2022
@mogul mogul added this to the Sprint 20220203 milestone Feb 3, 2022
@mogul mogul closed this as completed Feb 3, 2022
Repository owner moved this from Icebox to Product Backlog in data.gov team board Feb 3, 2022
@hkdctol hkdctol removed the status in data.gov team board Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

3 participants