-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eng 1086 create syntax sugar for sql queries #103
Eng 1086 create syntax sugar for sql queries #103
Conversation
sql_artifact_today = db.sql(query="select * from hotel_reviews where review_date = {{today}}") | ||
assert sql_artifact_today.get().empty | ||
sql_artifact_not_today = db.sql( | ||
query="select * from hotel_reviews where review_date != {{today}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is historical data, can we just do < today? Not sure if thats valid syntax
@@ -37,3 +37,13 @@ def test_invalid_destination_integration(client): | |||
output_artifact.save( | |||
config=db.config(table=generate_table_name(), update_mode=LoadUpdateMode.REPLACE) | |||
) | |||
|
|||
|
|||
def test_sugar_syntax_sql(client): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets be more specific with the test name. test_sql_today_tag
@@ -24,7 +25,12 @@ def discover(self) -> List[str]: | |||
return inspect(self.engine).get_table_names() | |||
|
|||
def extract(self, params: extract.RelationalParams) -> pd.DataFrame: | |||
df = pd.read_sql(params.query, con=self.engine) | |||
query = params.query | |||
if "{{today}}" in query: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think variation like {{ today }} and {{today }} should also be valid. Let's look for whats in between {{ and }} and strip out the whitespace. Then we take all the arguments extracted in this fashion and try to match it against a predefined map from [arg name] to func() -> str which replaces that string.
This will let us extend this easily to other sql arguments besides today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be better way of doing this, but you can just keep track of the indexes in the string that the argument starts and ends at, eg: [(5, 10), (15, 20), (36, 55)], then perform the string replacements in reverse.
The point is that we will want to support in the future 1) other types of sql arguments and 2) multiple sql arguments in a single statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that'll be ideal. I am trying to implement a version using RegEx so it extracts all the curly brace.
Make change as follow:
|
@@ -24,7 +26,15 @@ def discover(self) -> List[str]: | |||
return inspect(self.engine).get_table_names() | |||
|
|||
def extract(self, params: extract.RelationalParams) -> pd.DataFrame: | |||
df = pd.read_sql(params.query, con=self.engine) | |||
query = params.query | |||
matches = re.findall(r"{{[\s+]*\w+[\s+]*}}", query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this a constant up top and document it with comment indicating what it does.
This PR is the first attempt to create syntax sugar for sql queries, the implementation is the following:
Aqueduct_executor
, for all database connectors that take in SQL syntax, we process the query when we want to extract a dataframe from the database. If there exist"{{today}}"
in the SQL syntax, we will replace the substring with today'date.sql_integration_test.py
. I search on the datasethotel_reviews
based onreview_date = {{today}}
. It should return an empty dataframe. However, this test is not conclusive since there are many input that can result in empty dataframe.Would appreciate some suggestions~!