Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURES] Add the Proposal for Vineyard Runtime #3528

Closed
dashanji opened this issue Nov 9, 2023 · 3 comments · Fixed by #3649
Closed

[FEATURES] Add the Proposal for Vineyard Runtime #3528

dashanji opened this issue Nov 9, 2023 · 3 comments · Fixed by #3649
Labels
features features

Comments

@dashanji
Copy link
Contributor

dashanji commented Nov 9, 2023

Vineyard Runtime Design

Background

Why we need vineyard?

The following is a common big data workflow case.

image

Challenges: Using external storage (local disk or cloud S3, OSS service, etc.) to implement data exchange will bring a lot of I/O consumption, often becoming the entire Workflow bottlenecks

What is Vineyard?

Vineyard is a distributed in-memory data management system, it provides the following features:

  • Leverage the local shared memory: avoid unnecessary memory copies, serialization/deserialization, file I/O and network transmission, etc.
  • Distributed: support for distributed data between different systems.
  • Cloud Native: Deeply integrate with Kubernetes ecosystem, including operator, scheduler-plugin, csi-plugin, etc.
  • Simple and easy-to-use SDK: includes python, c++, rust, java and golang, etc.
How to use vineyard in kubernetes

Each object in the vineyard is composed of metadata + payload:

  • Vineyardd: shared memory manager, which stores payload information.
  • Etcd: Metadata management, storing metadata information.

The following is the layout of vineyard components in kubernetes:

image

Vineyard data access method

image

  • RPC: Slow, data needs to be migrated to task pods.
  • IPC: Fast, but require task pods and vineyardd pods to store data co-locate.
Important configuration in vineyard: shared memory
  • vineyardd itself supports a size limit that allows users to set it.
  • What if the shared memory is not enough?
    • Vineyard allows an optional spill configuration.
      • Use external storage (local disk/cloud oss, S3, etc.) for data storage.
      • When there is memory pressure, automatically spill objects that are not currently in use.
      • When spilled objects are reused, vineyard will automatically reload.
    • Transparent to users.
The observability of vineyard’s shared memory in kubernetes
  • Counted on vineyardd pod
    • The reserve memory option needs to be set as true when starting vineyardd.
  • Counted on task pod
    • The reserve memory option needs to be set as false when starting vineyardd.

Integrate with fluid

How to mount vineyard socket and perform service discovery.
  • Create a socket in the specified directory and expose the configuration file (configmap) of the vineyard service (IPC and RPC).

    • Vineyardd will create a socket in the directory where the hostpath is /runtime-mnt/vineyard/{{ .Release.Namespace }}/{{ template "vineyard.fullname" . }}.
    • In order to be compatible with the fluid-fuse detection mechanism, vineyard fuse mounts the configmap to /runtime-mnt/vineyard/{{ .Release.Namespace }}/{{ template "vineyard.fullname" . }}/vineyard-fuse/dummy-path.
    • Vineyard fuse will mount the /runtime-mnt/vineyard/{{ .Release.Namespace }}/{{ template "vineyard.fullname" . }} directory to /runtime-mnt/vineyard/{{ .Release.Namespace } }/{{ template "vineyard.fullname" . }}/vineyard-fuse.
  • How tasks perform service discovery?

    • Add fuse.vineyard.fluid.io/inject: "true" label for task Pod, and inject configmap env.
  • How task pod access data.

    • Via IPC: Task pod and vineyardd pod containing data are required to co-locate.
    • Via RPC: No need for task pods and vineyardd pods containing data to co-locate.
      • Currently, the rpc API is experimental, and it will be stably supported before the next major version of vineyard.
Vineyard Runtime API Design
spec:
  # Corresponds to the etcd component, configuration will launch an etcd statefulset
  master:
    # Meaning: Whether to deploy this component
    # Optional: Yes
    # Default: true, which means a new etcd cluster will be created as metadata management for vineyardd.
    # Example: Users can configure it as false if there is an existing external etcd cluster that vineyardd can connect to.
    enabled: true
    # Meaning: The number of replicas for etcd
    # Optional: Yes
    # Default: 1
    # Example: Users can set replicas to 3 or 5, etc.
    replicas: 1
    # Meaning: Official image of etcd
    # Optional: Yes
    # Default: bitnami/etcd
    # Description: By default, the official etcd image from docker.io is used.
    image: bitnami/etcd
    # Meaning: The strategy for pulling the image
    # Optional: Yes
    # Default: IfNotPresent
    # Description: The IfNotPresent policy is used to avoid redundant image pulls.
    imagePullPolicy: IfNotPresent
    # Meaning: Version number of the image
    # Optional: Yes
    # Default: 3.5.10
    # Description: Choose a version that can run stably.
    imageTag: 3.5.10
    # Meaning: Configure etcd's startup properties
    # Optional: Yes
    # Default: Empty
    # Description: Users can refer to https://etcd.io/docs/v3.6/op-guide/configuration/ for custom configurations.
    # Note that since etcd ultimately starts with command-line arguments, users cannot set some fixed command parameters like init_cluster that involve endpoint addresses.
    # Subsequently, based on the commands in the template, list some environment variables that are not supported for user settings.
    env: {}
    # Meaning: Configure etcd's resource requests and limits, etc.
    # Optional: Yes
    # Default: Empty
    # Description: Users can customize the resource size occupied by etcd. For example:
    #      resources:
    #        limits:
    #          memory: "2Gi"
    #          cpu: "2"
    #        requests:
    #          memory: "2Gi"
    #          cpu: "2"
    resources: {}
    # Meaning: Whether to deploy etcd on specific nodes
    # Optional: Yes
    # Default: Empty
    # Description: Use the labels on the node as selector. For example:
    #
    #    nodeSelector:
    #      kubernetes.io/hostname: kind-worker"
    #
    nodeSelector: {}
    # Configure etcd for persistent storage
    volumeMounts: {}
    # Meaning: Port number settings for etcd
    ports:
      # Meaning: The port number for clients to connect to etcd
      # Optional: Yes
      # Default: 2379
      client: 2379
      # Meaning: Communication port number for internal etcd processes
      # Optional: Yes
      # Default: 2380
      peer: 2380
  # Corresponds to the vineyardd component, configuration will launch a vineyardd statefulset
  # One reason for using statefulset is that each pod name is fixed and started in sequence
  # Therefore, the vineyardd's instance_id and endpoint can be distinguished by pod name.
  worker:
    # Meaning: Number of replicas for vineyardd
    # Optional: Yes
    # Default: 1
    # Description: It should be noted that each vineyardd pod should not be deployed on the same node.
    # Users need to ensure that the number of replicas does not exceed the number of all nodes in the k8s cluster.
    replicas: 1
    # Meaning: vineyardd image configuration
    # Optional: Yes
    # Default: Empty
    # Description: The official vineyardd image from docker.io. Users can also manually set the registry information.
    image: "vineyardcloudnative/vineyardd"
    # Meaning: The strategy for pulling the image
    # Optional: Yes
    # Default: IfNotPresent
    # Description: To avoid redundant pulls.
    imagePullPolicy: "IfNotPresent"
    # Meaning: Version number of the image
    # Optional: Yes
    # Default: 0.18.2
    # Description: Choose a stable supported version.
    imageTag: "0.18.2"
    # Meaning: Configure startup parameters for vineyardd
    # Optional: Yes
    # Default:
    #    properties:
    #      vineyard.reserve.memory: "true"
    # Description:
    # Description: Supports the following configurations:
    #        vineyardd.reserve.memory: Whether to reserve memory in vineyardd
    #        vineyardd.external.etcd.endpoint: Set external etcd cluster
    #
    properties:
      # Whether to reserve memory in vineyardd, the default is true.
      # It means the memory quota in k8s is taken into account on the vineyardd side.
      vineyardd.reserve.memory: "true"
      # If the user sets the following for an external etcd cluster, then vineyardd will connect to that cluster.
      # etcd endpoint (etcd-svc.etcd-namespace.svc.cluster.local:2379/10.244.3.4:2379)
      # vineyardd.external.etcd.endpoint: ""
    # Meaning: Whether to deploy vineyardd on specific nodes
    # Optional: Yes
    # Default: Empty
    # Description: Use the labels on the node as selector. For example:
    #   nodeSelector:
    #     "kubernetes.io/hostname: kind-worker3"
    nodeSelector: {}
    # Configure metrics storage for vineyard, etc.
    volumeMounts: {}
    # Meaning: Port numbers that need to be exposed in vineyardd
    # Optional: Yes
    # Default:
    #    rpc: 9600
    # Description: The rpc port exposed by vineyardd.
    ports:
      rpc: 9600
  # Corresponds to the fuse component, configuration will launch a fuse daemonset
  fuse:
    # Meaning: Fuse image
    # Optional: Yes
    # Default: ghcr.io/v6d-io/v6d/vineyard-mount-socket
    # Description: This image mainly performs the mount operation, mounting the directory containing the vineyard socket into the persistent volume path, so that the user's pod can ultimately access the vineyard socket.
    image: ghcr.io/v6d-io/v6d/vineyard-mount-socket
    imagePullPolicy: IfNotPresent
    imageTag: latest
  # Meaning: Define multi-tiered storage for vineyardd
  # Optional: Yes
  # Default:
  #  tieredstore:
  #    levels:
  #    - level: 0
  #      mediumtype: MEM
  #      quota: 4Gi
  #
  # Description: By default, only shared memory is used as the primary storage method. Users can also set up external storage as a secondary storage,
  # enabling the spill mechanism in vineyardd, such as using hostpath with the following configuration:
  #
  #      tieredstore:
  #        levels:
  #        - level: 0
  #          mediumtype: MEM
  #          quota: 4Gi
  #          high: "0.8"
  #          low: "0.3"
  #        - level: 1
  #          mediumtype: SSD
  #          quota: 10Gi
  #          volumeType: Hostpath
  #          path: /var/spill-path
  #
  #      Or using custom external storage:
  #
  #      tieredstore:
  #        levels:
  #        - level: 0
  #          mediumtype: MEM
  #          quota: 4Gi
  #          high: "0.8"
  #          low: "0.3"
  #        - level: 1
  #          mediumtype: SSD
  #          quota: 30Gi
  #          volumeType: volumeTemplate
  #          path: /var/spill-path
  #          volumeSource:
  #            persistentVolumeClaim:
  #              claimName: external-storage-pvc-name
  #
  tieredstore:
    levels:
    # Level 0 represents the size of shared memory for vineyardd
    - level: 0
      mediumtype: MEM
      # Shared memory size for vineyardd
      quota: 4Gi
      # Configure the high watermark for spilling
      #high: "0.8"
      # Configure the low watermark for spilling
      #low: "0.3"
    # Level 1 represents the spill configuration path for vineyardd
    #- level: 1
    #  mediumtype: SSD
    #  quota: 10Gi
    #  volumeType: Hostpath
    #  Path for mounting the spill
    #  path: /var/spill-path
  # Meaning: Whether to disable Prometheus, i.e., not expose metrics information related to vineyardd
  # Optional: Yes
  # Default: false
  # Description: By default, Prometheus is not disabled.
  disablePrometheus: false
  volumes: {}
Best Practice
  • How to set the vineyard pod's own required, limits, and vineyardd's size-limit.

    • The reserve memory option is enabled by default. The default size-limit of vineyardd is 256Mi, while the required and limits of vineyard pod are 256Mi + 100Mi(for vineyardd itself). If the user sets the size-limit to 8Gi, the required limits of the vineyard pod become 8Gi + 100Mi.
  • How to set the required and limits of task pod memory.

    • When the reserveMemory option is enabled.
      • Shared memory is counted in vineyardd and does not occupy the memory quota of task pod.
      • When the task pod migrates data to vineyard, there is no need to increase the required memory required and limits.
    • When the reserveMemory option is disabled.
      • Shared memory is counted in the task pod and occupies the memory quota of the task pod.
        • For reader tasks, there is no need to increase the required and limits of additional memory.
        • For writer tasks, the shared memory size needs to be added to required and limits.
  • Optimum performance.

poc

dashanji#1

@dashanji dashanji added the features features label Nov 9, 2023
@dashanji
Copy link
Contributor Author

dashanji commented Nov 9, 2023

Welcome any comments!

@cheyang
Copy link
Collaborator

cheyang commented Nov 9, 2023

@dashanji Thanks for your great contributions! I'm considering why the vineyardRuntime API cannot be consistent with alluxioRuntime. Also, what is the necessity of exposing the concept of etcd? Can it be more abstract? Here is a sample from my perspective. It's just a example. :)

spec:
  replicas: 1
  volumes:
    - name: pvc-1
      secret:
        secretName: pvc-1
  # statefulset
  tieredstore:
    levels:
    - level: 0
      mediumtype: MEM
      quota: 2GB
      high: "0.95"
      low: "0.7"
    - level: 1
      mediumtype: SSD
      volumeType: Hostpath
      path: /swap-path
  worker:
    image: vineyardcloudnative/vineyardd
    imagePullPolicy: IfNotPresent
    imageTag: latest
    env: {}
    nodeSelector: {}
    volumeMounts:
      - name: pvc-1
        readOnly: true
        mountPath: "/var/pvc-1"
    options:
      socketPath: /var/run
    ports:
      port: 9600
  # statefulset
  master:
    replicas: 1
    options:
      socketPath: /var/run
    ports:
      etcd: 1200
      rpc: 5000
    env:
    volumeMounts:
      - name: pvc-1
        readOnly: true
        mountPath: "/var/pvc-1"
  # daemonset
  fuse:
    image: ghcr.io/v6d-io/v6d/vineyard-mount-socket
    imagePullPolicy: IfNotPresent
    imageTag: latest
    cleanPolicy: OnRuntimeDeleted

@dashanji
Copy link
Contributor Author

I'm considering why the vineyardRuntime API cannot be consistent with alluxioRuntime. Also, what is the necessity of exposing the concept of etcd? Can it be more abstract?

Make sense to me. It seems that master and worker are more user-friendly than vineyardd and etcd. I will polish the VineyardRuntime API ASAP and we can discuss here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
features features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants