-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Octavia CLI has option to use existing IDs for sources, destinations, and connections #13203
Comments
cc @ChristopheDuong to comment on what bad things can happen if we allow users to set the ID of their connections. |
Hey @evantahler , |
I don't think this will solve the problem. We need to know the ID of the connection before we use the CLI. If I'm trying to bootstrap an environment, I need to know that the connection ID /will be/ |
In your use case do you also want to orchestrate with Airflow the creation of the Airbyte resources from octavia? I think we could add a |
Ooh - I think the get commands are the missing piece of the puzzle! That would allow us to spin up the cluster, setup our connections, and then get the IDs to pass to whatever other tool we need. Awesome! I'll close this in favor of #13254 cc @marcosmarxm |
The Problem
When using Octavia as a tool to backup an Airbyte project, or use it promote the same project from CI -> Staging -> Production, we should be able to set the IDs of our sources, destinations, and connections deterministically. The IDs are important because that is how other tools (e.g. the API) will interact the connection.
This allows easier use of the Airbyte API, because the URL to start a sync relies on the connection ID, which will be well-known. Today, running a specific connection on an airbyte server that is created by the Octavia CLI involves looping though all the connections and finding the ID of the one you want by name:
From the Airflow Airbyte Operator:
A similar problem arises when creating a new connection from a source and destination. If the IDs of the source and destination cannot be known ahead of time, building the connection is not possible without first re-reading the YML files of source and destination. Even though the source and destination YML files are checked into git, Octavia re-generates the IDs of those connectors on every run. This means that (1) the code in git does not match reality and (2) making the connection requires an extra programatic step - see https://github.com/airbytehq/airflow-summit-airbyte-2022/blob/main/tools/change_resource_id.py
Desired solution
A flag for the Octavia CLI to use the IDs already present in the YML files:
This requires the Airbyte API to accept user-provided IDs and not always auto-generate a new UUID. These UUIDs/IDs should be validated for uniqueness on creation.
I would argue that
--use-id
should be the default behavior, as the expectation is that the server's state should match the YML files exactly.Possible Side Effects
Airbyte assumes that the connection ID is unique in our metrics. If the same connection ID appears in multiple servers, things might get weird.
The text was updated successfully, but these errors were encountered: