Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git-sync with kubernetesExecutors and using https #468

Open
PaulienVa opened this issue Jul 16, 2024 · 7 comments
Open

Git-sync with kubernetesExecutors and using https #468

PaulienVa opened this issue Jul 16, 2024 · 7 comments
Labels

Comments

@PaulienVa
Copy link
Contributor

Affected Stackable version

Nightly

Affected Apache Airflow version

2.8.1

Current and expected behavior

Using this AirflowCluster:

---
apiVersion: airflow.stackable.tech/v1alpha1
kind: AirflowCluster
metadata:
  name: airflow
spec:
  image:
    productVersion: 2.8.1
  clusterConfig:
    loadExamples: false
    exposeConfig: false
    listenerClass: external-unstable
    credentialsSecret: simple-airflow-credentials
    dagsGitSync: 
    - repo: https://custom.gitlab.com/repos/airflow-dags.git
      branch: "main" 
      gitFolder: "dags" 
      depth: 10 
      wait: 20 
      credentialsSecret: gitcredentials
      gitSyncConf: 
        --rev: HEAD 
        --git-config: http.sslCAInfo:/tmp/ca-cert/ca.crt
    volumeMounts:
    - name: "cert-ca"
      mountPath: "/tmp/ca-cert"  
    volumes:
    - name: "cert-ca"
      configMap:
        name: "cert-ca"
        items: 
        - key: "ca.crt"
          path: "ca.crt"
  webservers:
    roleGroups:
      default:
        replicas: 1
    podOverrides:
      spec:
        containers:
          - name: gitsync-1
            volumeMounts:
            - name: "cert-ca"
              mountPath: "/tmp/ca-cert"  
  kubernetesExecutors:
    config:
      resources:
        cpu:
          min: 400m
          max: 800m
        memory:
          limit: 2Gi
    podOverrides:
      spec:
        containers:
          - name: gitsync-1
            volumeMounts:
              - name: "cert-ca"
                mountPath: "/tmp/ca-cert"
  schedulers:
    roleGroups:
      default:
        replicas: 1
    podOverrides:
      spec:
        containers:
          - name: gitsync-1
            volumeMounts:
            - name: "cert-ca"
              mountPath: "/tmp/ca-cert"

The gitSyncConf does not seem to get propagated for the kubernetesExecutors. Also the git-sync container has a one-time=true flag set for the git-sync command

Follow up this MR: #456 (comment)

Expected behavior: the webserver, scheduler and kubernetesExecutors are configured exactly the same for the git-sync containers

Possible solution

No response

Additional context

No response

Environment

No response

Would you like to work on fixing this bug?

None

@razvan
Copy link
Member

razvan commented Jul 16, 2024

There is a ConfigMap named airflow-executor-pod-template which is used as template for the Kubernetes executors. The init container for git-sync in that template is called gitsync-0 not gitsync-1. So if you change the pod overrides for the executors to say:

  kubernetesExecutors:
...
    podOverrides:
      spec:
        containers:
          - name: gitsync-0 ### <---- this name
            volumeMounts:
              - name: "cert-ca"
                mountPath: "/tmp/ca-cert"

things should work as expected.

The naming difference seems to be an unfortunate accident as far as I can see. 😞

Hope this helps.

@PaulienVa
Copy link
Contributor Author

PaulienVa commented Jul 16, 2024

@razvan we do see an init container with the name gitsync-1 in the pods that are started off when kicking off a DAG, so I am afraid that that is not the solution :( Somehow the gitsync will end up in error for the kubernetesExecutors

@razvan
Copy link
Member

razvan commented Jul 16, 2024

Hm, can you share the contents of the airflow-executor-pod-template ConfigMap?

@PaulienVa
Copy link
Contributor Author

Yes:

  
metadata:
  labels:
   app.kubernetes.io/component: executor
   app.kubernetes.io/instance: airflow
   app.kubernetes.io/managed-by: airflow.stackable.tech_airflowcluster
   app.kubernetes.io/name: airflow
   app.kubernetes.io/role-group: executor-template
   app.kubernetes.io/version: 2.8.1-stackable24.3.0
   stackable.tech/vendor: Stackable

spec:
  affinity: {}
  containers:
  - env:
   - name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
     valueFrom:
       secretKeyRef:
         key: connections.sqlalchemyDatabaseUri
         name: simple-airflow-credentials
   - name: AIRFLOW__CORE__EXECUTOR
     value: LocalExecutor
   - name: AIRFLOW__KUBERNETES_EXECUTOR__NAMESPACE
     value: <NAMESPACE>
   - name: AIRFLOW__CORE__DAGS_FOLDER
     value: /stackable/app/git/current/dags
   image: docker.stackable.tech/stackable/airflow:2.8.1-stackable24.3.0
   imagePullPolicy: Always
   name: base
   resources:
     limits:
       cpu: 800m
       memory: 2Gi
     requests:
       cpu: 400m
       memory: 2Gi
   volumeMounts:
   - mountPath: /tmp/ca-cert
     name: cert-ca
   - mountPath: /stackable/app/git
     name: content-from-git
   - mountPath: /stackable/app/config
     name: config
   - mountPath: /stackable/app/log_config
     name: log-config
   - mountPath: /stackable/log
     name: log
  enableServiceLinks: false
  initContainers:
  - args:
   - /stackable/git-sync --repo=https://custom.gitlab.com/repos/airflow-dags.git --ref=main --depth=10 --period=20s --link=current --root=/tmp/git --rev=HEAD --git-config='safe.directory:/tmp/git,http.sslCAInfo:/tmp/ca-cert/ca.crt' --one-time=true

   command:
   - /bin/bash
   - -x
   - -euo
   - pipefail
   - -c
   env:
   - name: GITSYNC_USERNAME
     valueFrom:
       secretKeyRef:
         key: user
         name: gitcredentials
   - name: GITSYNC_PASSWORD
     valueFrom:
       secretKeyRef:
         key: password
         name: gitcredentials
   image: docker.stackable.tech/stackable/airflow:2.8.1-stackable24.3.0
   imagePullPolicy: Always
   name: gitsync-1

   resources:
     limits:
       cpu: 200m
       memory: 64Mi
     requests:
       cpu: 100m
       memory: 64Mi

   volumeMounts:
   - mountPath: /tmp/git
     name: content-from-git
  restartPolicy: Never
  securityContext:
   fsGroup: 1000
   runAsGroup: 0
   runAsUser: 1000
  serviceAccountName: airflow-serviceaccount
  terminationGracePeriodSeconds: 300
  volumes:

  - configMap:
     items:
     - key: ca.crt
       path: ca.crt
     name: cert-ca
   name: cert-ca
  - configMap:
     name: airflow-executor-kubernetes
   name: config
  - emptyDir:
     sizeLimit: 30Mi
   name: log
  - configMap:
     name: airflow-executor-kubernetes
   name: log-config
  - emptyDir: {}
   name: content-from-git
 

@razvan
Copy link
Member

razvan commented Jul 16, 2024

It looks like you are using SDP 24.3.0 which doesn't contain the fix from #456

app.kubernetes.io/version: 2.8.1-stackable24.3.0

To take advantage of that fix, you need to use the development version of the Airflow operator as suggested or wait until 24.7 is out. This should happen in the next couple of weeks.

@razvan
Copy link
Member

razvan commented Jul 17, 2024

Thank you again for your report.

Having looked a bit longer at your use case I created a follow-up issue that should simplify things a lot in the future.

Regarding the upcoming operator version, the correct way to mount the CA in the kubernetes executor pods will be:

  kubernetesExecutors:
...
    podOverrides:
      spec:
        initContainers:      ###  not a side-car container in this case
          - name: gitsync-0  ###  init containers are called "gitsync-0". Side containers are called "gitsync-1"
            volumeMounts:
              - name: "cert-ca"
                mountPath: "/tmp/ca-cert"

For celery executor pods, both the init and the side-car containers must be patched like so:

 celeryExecutors:
...
   podOverrides:
     spec:
       initContainers:
         - name: gitsync-0
           volumeMounts:
             - name: "cert-ca"
               mountPath: "/tmp/ca-cert"
       containers: 
         - name: gitsync-1
           volumeMounts:
             - name: "cert-ca"
               mountPath: "/tmp/ca-cert"

Hope this helps and sorry for the trouble.

@PaulienVa
Copy link
Contributor Author

Okay I will test this with this config and let you know. Thank you for diving into it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants