Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: create airbyte to bigquery lineage #16830

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

aballiet
Copy link
Contributor

@aballiet aballiet commented Jun 28, 2024

Describe your changes:

Fixes #16829

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Copy link
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@ulixius9
Copy link
Member

ulixius9 commented Jul 3, 2024

@aballiet I've couple of questions regarding this pr:

  1. I see that you are trying for find the project name and then finding the bigquery service in OpenMetadata related to that project, why do we need this approach? for airbyte we expect the name of them service in OpenMetadata would be same as the destination connection name in Airbyte, can you not maintain this consistent naming for this?
  2. Also there is a fallback applied when source table involved in the table is not found, this might be applicable to your use case but other users might not expect such lineage any particular reason why the source table would not be found?

@aballiet
Copy link
Contributor Author

aballiet commented Jul 3, 2024

@aballiet I've couple of questions regarding this pr:

  1. I see that you are trying for find the project name and then finding the bigquery service in OpenMetadata related to that project, why do we need this approach? for airbyte we expect the name of them service in OpenMetadata would be same as the destination connection name in Airbyte, can you not maintain this consistent naming for this?
  2. Also there is a fallback applied when source table involved in the table is not found, this might be applicable to your use case but other users might not expect such lineage any particular reason why the source table would not be found?
  1. I think the best approach to encourage adoption of the tool is to reduce the amount of naming convention. When setting up OpenMetadata, you may already have a lot of tools setup in your stack so better use truth from config instead of user entry to perform matching.

It's just a better design approach IMO to use Google BigQuery config to match with the correct database in OMD instead of a name in Airbyte that can change at anytime. Enforcing new naming convention is a blocker for us, we have 100s of Airbyte connection and will not change the names just because we have OMD.

  1. It's not a fallback, it's the default behavior you have to support other destination using name to match (Snowflake, or any other available Airbyte destination). I just made the integration for BigQuery, ideally we would do it for every major destination supported in Airbyte.

Hope it makes more sense. In any case, we cannot merge this right now, as we need to wait OMD 1.5 to support lineage for pipelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Airbyte lineage between airbyte pipeline (api source) and BigQuery tables
2 participants