-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Merge common code to DbApiHook #27763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so... what is the purpose of this method? what utility does it present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just trying to keep query_ids from the Snowflake.
It seems useless for Airflow itself, but Hook users can use it for tracking their queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so if we keep this, i'd prefer to make the interface on DBApi more generic, like _post_execute_hook. Or something like that. Maybe there's a better name out there. Then snowflake can implement it and use it to store the query ids.
One problem is that hooks do a variety of executing in a variety of contexts. So, perhaps we can come up with something that's a little more generic, i.e. compatible with if, say, we wanted to add a hook / callback in a different context within the functionality of the hook.
Meanwhile, I'm checking with @mobuchowski to see whether we still need query ids on snowflake hook for the original purpose (which was lineage) or whether we are now scraping that info some other way.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for the record - see the k
ind of problems that the common.sql MIGHT introduce if we remove some seeminglly unused methods.
I think we should be very, very, very careful when we remove anything from common.sql provider. Because we might simply not realize that it has been used.
This is the effect of common code in full swing if you do not have very well defined "public API" upfront AND you do not check (Python WTF) that even if you declared something as protected, some other entity might rely on it.
This is the learnings we should have that once we make some code common and used elsewhere, refactoring and removing stuff from it should be done extremely carefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @potiuk i thought your position was "let's just break things" ;)
@mobuchowski it looks like you rely on such an attr on the operator class not the hook
is that correct? To me, that makes a lot more sense. Although, it looks like in main this may be broken, because I do not see anything that touches the query_ids attr on this operator after init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dstandish you're right, we're using it on operator - looks like it was broken on #25717.
It's not a critical for us - we can accept this breakage if we can agree on exposing this kind of information from hooks via operators in any other way. I think it would be useful for any other kinds of metadata.
For example, I dislike the fact that SQLExecuteQueryOperator gets hook in execute method, but does not set it as self instance variable.
There's a lot to be done on how to expose execution metadata.
c4be9e9 to
9953a2d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method still implemented here to keep Snowflake's split_statements: bool = True default parameter
9953a2d to
e0bd57c
Compare
|
I think we need to approach any such changes very, very carefully after #27854 and #27838 - this is really manifestation of the fact that when code is closely coupled, separating common functionality to separate "component" is HARD and have to be very, carefully reviewed. The scenarios where old providers use new common.sql have to be carefully considered - and this is what should happen before we merge this one. Possibly we should even wait until we add some automated tests to verify some of the faillure scenarios. |
e0bd57c to
284bdd4
Compare
|
I totally agree with you. |
Yeah. More testing and the #27946 might be helpful :). I think that was good learning for all of us |
284bdd4 to
59849fd
Compare
|
Needs conflict resolution/rebase now. |
|
hi @kazanzhy do we still need this PR? if so please rebase |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
Merge common code for SnowflakeHook, TrinoHook, and PrestoHook to