Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Platform Chart changes to Temporal Deployment causing TLS issues. #43328

Open
PurseChicken opened this issue Aug 6, 2024 · 25 comments · May be fixed by airbytehq/airbyte-platform#361
Open

Platform Chart changes to Temporal Deployment causing TLS issues. #43328

PurseChicken opened this issue Aug 6, 2024 · 25 comments · May be fixed by airbytehq/airbyte-platform#361
Labels
area/platform issues related to the platform community team/deployments type/bug Something isn't working

Comments

@PurseChicken
Copy link

PurseChicken commented Aug 6, 2024

Helm Chart Version

0.399.0

What step the error happened?

On deploy

Relevant information

The Temporal Deployment manifest was changed to assume that TLS\SSL is needed if using an external Database:

        {{- if eq .Values.global.database.type "external" }}
        # Assume an external database requires SSL.
          - name: POSTGRES_TLS_ENABLED
            value: "true"
          - name: POSTGRES_TLS_DISABLE_HOST_VERIFICATION
            value: "true"
          - name: SQL_TLS_ENABLED
            value: "true"
          - name: SQL_TLS_DISABLE_HOST_VERIFICATION
            value: "true"
        {{- end }}

This causes significant issues when using, for example, a sidecar Proxy which does not require TLS.

We should not be making an assumption here. In my opinion, there should be a key value pair in values which enables \ disables TLS\SSL if required in the deployment.

E.G.

temporal:
  enabled: true
  externalSSL: false

Relevant log output

[Fx] Error returned: received non-nil error from function "go.temporal.io/server/temporal".ServerOptionsProvider
	/home/builder/temporal/temporal/fx.go:180:
sql schema version compatibility check failed: pq: SSL is not enabled on the server
@PurseChicken PurseChicken added area/platform issues related to the platform needs-triage type/bug Something isn't working labels Aug 6, 2024
PurseChicken referenced this issue in airbytehq/airbyte-platform Aug 7, 2024
…al databases. (#13095)

Follow up to airbytehq/oncall#5843.

Here is a suggested approach to automatically enabling SSL if the database type is external.

The assumption here is all external database type databases have SSL turned on.
@chulkilee
Copy link

Hitting this issue as well. Please do not make assumptions without making it configurable.

@film42
Copy link

film42 commented Aug 15, 2024

@davinchia Can we get a fix for this? This broke the helm chart.

Edit: Why is it a bad assumption?

  1. Using a service mesh like isito where you intend for app-to-app traffic to "appear" unencrypted so the service mesh can handle all of the encryption for you.
  2. If you're running a local sidecar like google cloud sql auth proxy which handles ssl on your behalf.

Hope that helps.

@davinchia
Copy link
Contributor

Hi everyone, thanks for the feedback, and sorry for the inconvenience.

Curious, why are folks setting this field? This doesn't do anything except enable Temporal SSL on the backend, and is more of an experimental flag we are testing internally. Unsetting this should fix this for everyone.

@film42
Copy link

film42 commented Aug 19, 2024

I didn’t set any flag, I think. It failed when I upgraded the helm chart version.

@PurseChicken
Copy link
Author

PurseChicken commented Aug 19, 2024

Hi everyone, thanks for the feedback, and sorry for the inconvenience.

Curious, why are folks setting this field? This doesn't do anything except enable Temporal SSL on the backend, and is more of an experimental flag we are testing internally. Unsetting this should fix this for everyone.

What field are you referring to? .Values.global.database.type?

The issue here is that if .Values.global.database.type is set to external, then automatically the Temporal deployment gets specific TLS environment variables applied which is causing issues. I believe most of us who are running into issues are using external databases and setting .Values.global.database.type to external.

Setting that flag to internal when actually using an external database does not seem like the right way to configure the charts values file. If thats the instruction, thats not an acceptable fix in my opinion. More like a workaround with other possible downstream affects. If not now, then in the future as that field may be referenced in other areas of the chart(s).

As mentioned in the OP, these environment variables for Temporal TLS should be controlled in the temporal section of the charts values file rather than setting them based on an assumption.

@djpirra
Copy link

djpirra commented Sep 13, 2024

How was this solved? I am trying to use an external PostgreSQL database and having this notification of SSL.

sql schema version compatibility check failed: pq: SSL is not enabled on the server
Unable to create server. Error: could not build arguments for function "go.uber.org/fx".(*module).constructCustomLogger.func2 (/go/pkg/mod/go.uber.org/[email protected]/module.go:251): failed to build fxevent.Logger: could not build arguments for function "go.temporal.io/server/temporal".init.func8 (/home/builder/temporal/temporal/fx.go:1029): failed to build log.Logger: received non-nil error from function "go.temporal.io/server/temporal".ServerOptionsProvider (/home/builder/temporal/temporal/fx.go:180): sql schema version compatibility check failed: pq: SSL is not enabled on the server.

@film42
Copy link

film42 commented Sep 13, 2024

For the moment I deploy the helm chart and manually edit the airbyte-temporal so the following envs are "false":

        - name: POSTGRES_TLS_ENABLED
          value: "false"
        - name: SQL_TLS_ENABLED
          value: "false"

If you're using something like kustomize you might be able to apply a patch to do this but the stack I have using Airbyte isn't set up that way. Hoping to see this issue fixed soon.

@marcosmarxm
Copy link
Member

Hello all 👋 sorry the lack of updates. I asked the platform team to take a look into airbytehq/airbyte-platform#361 any return from them I'm going to update here.

@PurseChicken
Copy link
Author

Just checking in on this issue. This is still a problem in v1.1.1 of the helm chart. This is preventing us from updating.

Thank you

@PurseChicken
Copy link
Author

As a workaround to this issue, you can set ".Values.global.database.type" to internal, and then the TLS environment variables will not be added to the temporal deployment.

I looked pretty heavily through the charts manifests, and currently it seems only the temporal deployment references the ".Values.global.database.type" key in values. If that key is ever used elsewhere in the future this could be problematic though.. but for now as a workaround you can set it to the default of internal. I do feel strongly that this should still be tied to something more specific and not by an assumption.

@kev-datams
Copy link
Contributor

@PurseChicken the problem is that we want to use an external database without SSL, as the CloudSQL Auth Proxy (GCP) is already managing the TLS/SSL natively

@PurseChicken
Copy link
Author

PurseChicken commented Nov 10, 2024

@PurseChicken the problem is that we want to use an external database without SSL, as the CloudSQL Auth Proxy (GCP) is already managing the TLS/SSL natively

@kev-datams
You can still do so even with setting ".Values.global.database.type" to internal. Fortunately, as of now the chart does not use that value for anything other than setting the TLS environment variables on the temporal deployment.

By no means am I saying this is a fix. This should definitely not be the way. That said, since that key is only used for the temporal deployment, setting it to internal has no other affect. External databases can still be used (Or CloudSQL proxy etc). Its merely a workaround.

@kev-datams
Copy link
Contributor

@PurseChicken you mean we can set values.yaml like this:

postgresql:
  enabled: false

global:
  database:
    type: internal
    secretName: "airbyte-config-secrets"
    host: "x.x.x.x"
    port: 5432
    database: "airbyte"
    userSecretKey: "database-user"
    passwordSecretKey: "database-password"

and it will use the external database ? 🤔

Also, do you already managed to set up an external db using the Cloud SQL Proxy on a GCP VM ?

Thanks

@PurseChicken
Copy link
Author

PurseChicken commented Nov 13, 2024

@kev-datams, Sorry for the delay.

You are correct. Even if you set .Values.global.database.type to internal, it will still use an external database. The only thing that is currently set by that key is the TLS configuration on the temporal deployment.

This is how I have configured it for CloudSQL Proxy using IAM credentials (Workload Identity)
P.S. Not sure if externalDatabase section is required anymore either

global:
  database:
    type: "internal" 
    host: "cloud-sql-proxy"
    port: "5432"
    database: "airbyte"
    user: "[email protected]"
    password: "notreal"
postgresql:
  enabled: false
externalDatabase:
  host: "cloud-sql-proxy"
  user: "[email protected]"
  password: ""
  existingSecret: ""
  existingSecretPasswordKey: ""
  database: "airbyte"
  port: "5432"
  jdbcUrl: ""

Yes I have used Airbyte with cloudSQL proxy both as a sidecar to each pod as well as just a separate pod that was deployed along side the airbyte deployment. In the most recent versions of the Airbyte helm chart its been harder to use the sidecar approach, so moving to a separate pod deployment has been better \ easier.

@kev-datams
Copy link
Contributor

@PurseChicken thanks ! please could you share your YAML of the CloudSQLProxy ? 🙏

few complementary questions:

  • was it mandatory to bind the IAM SA with the K8 SA ?
  • did you use the --auto-iam-authn flag in the Proxy pod ?

Thanks :)

@PurseChicken
Copy link
Author

PurseChicken commented Nov 14, 2024

@kev-datams If you want to use Workload Identity for authentication into cloudSQLProxy, then yes the K8's SA needs to be bound to the IAM SA:

role: roles/iam.workloadIdentityUser
Members: serviceAccount:my-project.svc.id.goog[my-namespace/my-k8s-service-account]

Also the IAM SA needs to have CloudSQL Client and CloudSQL Instance User GCP roles.

Finally, yes, I use --auto-iam-authn on the proxy pod.

Thanks!

@kev-datams
Copy link
Contributor

kev-datams commented Nov 29, 2024

@PurseChicken, how did you manage the "first" deployment of Airbyte using abctl ?

abctl local install --values=values.yaml --secret=secrets.yaml --chart-version=1.1.1

we have the below error message:

ERROR   airbyte-bootloader: i.a.d.c.DatabaseAvailabilityCheck(lambda$isDatabaseConnected$1):78 - Failed to verify database connection.
ERROR   airbyte-bootloader: org.jooq.exception.DataAccessException: Error getting connection from data source HikariDataSource (HikariPool-1)
ERROR   airbyte-bootloader: Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30002ms (total=0, active=0, idle=0, waiting=0)
ERROR   airbyte-bootloader: Caused by: org.postgresql.util.PSQLException: The connection attempt failed.
ERROR   airbyte-bootloader: Caused by: java.net.UnknownHostException: cloud-sql-proxy-service

As the CloudSQLProxy pod needs to be deployed using the airbyte namespace and kubeconfig file...

kubectl apply -f cloud-sql-proxy.yaml -n airbyte-abctl --kubeconfig ~/.airbyte/abctl/abctl.kubeconfig

...but Airbyte deployments need to reach the external DB via the CloudSQLProxy pod which must exist 🤔

so we face the chicken or the egg problem 🤕

Thanks a lot !

@PurseChicken
Copy link
Author

@kev-datams I use Argo for deployment. So I either sync the pod and service first or add an annotation which sync's those resources before anything else. That's how I make sure the pod is running first. Then everything else works. In your case it shows your hostname not existing, So the name is either wrong, or the service is not deployed yet.

I never had to mess with abctl so I can't comment on that.

@stemarks
Copy link

stemarks commented Dec 4, 2024

@PurseChicken

looking at the helm chart code, the externalDatabase values dont appear to be used anywhere in the templates. I think its been replaced with the global.database values but not removed from the default file.

I changed globl.database.type to internal and that resolves the tls error but for some reason its looking for a database called temporal rather than my database specified in global.database.database

@PurseChicken
Copy link
Author

PurseChicken commented Dec 4, 2024

@stemarks I think that .Values.externalDatabase might now be left over from a previous way to configure things. I also am not finding reference to that key in any of the templates.

That said, deploying airbyte does require temporal databases if you have temporal enabled (which I believe it is by default). You can control what the database names are in .Values.temporal:

temporal:
  enabled: true

  ## Example:
  ##
  ## extraEnv:
  ## - name: SAMPLE_ENV_VAR
  ##   value: "key=sample-value"
  # -- Additional env vars for temporal pod(s).
  extraEnv:
    - name: DBNAME
      value: "airbyte-temporal"
    - name: VISIBILITY_DBNAME
      value: "airbyte-temporal-visibility"
    - name: SKIP_DB_CREATE
      value: "true"

I use "SKIP_DB_CREATE" with the value of true so that the temporal pod does not try to create these databases, They are created using IaC as part of the overall airbyte deployment (external to the airbyte chart).

@stemarks
Copy link

stemarks commented Dec 4, 2024

@PurseChicken yep that fixed that thanks. Now onto the rest of the failing pods 😂

@kev-datams
Copy link
Contributor

kev-datams commented Dec 4, 2024

Thanks for sharing
Would be better to fix the original issue and avoid setting global.database.type to internal if we use an external one 😢
Nobody from Airbyte team here to address it ? 🙏

@PurseChicken
Copy link
Author

Thanks for sharing Would be better to fix the original issue and avoid setting global.database.type to internal if we use an external one 😢 Nobody from Airbyte team here to address it ? 🙏

100 percent agree with you! That's the reason why this issue should still exist!

@marcosmarxm
Copy link
Member

Hello everyone I've asked the platform team to take a look into this issue and the pull request enables SSL/TLS for Temporal. Any update return to you here.

@kev-datams
Copy link
Contributor

Hi @PurseChicken,

We are finally trying to deploy to GKE (bye bye abctl) with external CloudSQL DB using CloudSQL Proxy with sidecar pattern as recommended by GCP for those reasons.

Do you have more elements on what you said:
In the most recent versions of the Airbyte helm chart its been harder to use the sidecar approach, so moving to a separate pod deployment has been better \ easier.

The issue looks related to airbyte-bootloader only, so let's crack it 🔥
If you're interested, we have an opened Slack discussion 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform issues related to the platform community team/deployments type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants