Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eng 1086 create syntax sugar for sql queries #103

Merged
merged 10 commits into from
Jun 16, 2022

Conversation

Fanjia-Yan
Copy link
Contributor

This PR is the first attempt to create syntax sugar for sql queries, the implementation is the following:

  1. In Aqueduct_executor, for all database connectors that take in SQL syntax, we process the query when we want to extract a dataframe from the database. If there exist "{{today}}" in the SQL syntax, we will replace the substring with today'date.
  2. I have created a test under sql_integration_test.py. I search on the dataset hotel_reviews based on review_date = {{today}}. It should return an empty dataframe. However, this test is not conclusive since there are many input that can result in empty dataframe.

Would appreciate some suggestions~!

@Fanjia-Yan Fanjia-Yan requested review from kenxu95 and cw75 June 14, 2022 22:51
sql_artifact_today = db.sql(query="select * from hotel_reviews where review_date = {{today}}")
assert sql_artifact_today.get().empty
sql_artifact_not_today = db.sql(
query="select * from hotel_reviews where review_date != {{today}}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is historical data, can we just do < today? Not sure if thats valid syntax

@@ -37,3 +37,13 @@ def test_invalid_destination_integration(client):
output_artifact.save(
config=db.config(table=generate_table_name(), update_mode=LoadUpdateMode.REPLACE)
)


def test_sugar_syntax_sql(client):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets be more specific with the test name. test_sql_today_tag

@@ -24,7 +25,12 @@ def discover(self) -> List[str]:
return inspect(self.engine).get_table_names()

def extract(self, params: extract.RelationalParams) -> pd.DataFrame:
df = pd.read_sql(params.query, con=self.engine)
query = params.query
if "{{today}}" in query:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think variation like {{ today }} and {{today }} should also be valid. Let's look for whats in between {{ and }} and strip out the whitespace. Then we take all the arguments extracted in this fashion and try to match it against a predefined map from [arg name] to func() -> str which replaces that string.

This will let us extend this easily to other sql arguments besides today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be better way of doing this, but you can just keep track of the indexes in the string that the argument starts and ends at, eg: [(5, 10), (15, 20), (36, 55)], then perform the string replacements in reverse.

The point is that we will want to support in the future 1) other types of sql arguments and 2) multiple sql arguments in a single statement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that'll be ideal. I am trying to implement a version using RegEx so it extracts all the curly brace.

@Fanjia-Yan
Copy link
Contributor Author

Make change as follow:

  1. For the integration test, we check "<" instead of "!=", this will make sure that today is a calendar date for comparison
  2. For the syntax sugar. I create a RegEx which will identify anything that matches {{[space*][tag][space*]}}, * means optional. This will help us extract all the tag, which will also be used when there are parameters. After that I strip the space and curly brace and compare to our tag. If there is a tag, we will process the tag and replace the sql query.

@Fanjia-Yan Fanjia-Yan requested a review from kenxu95 June 15, 2022 22:05
@@ -24,7 +26,15 @@ def discover(self) -> List[str]:
return inspect(self.engine).get_table_names()

def extract(self, params: extract.RelationalParams) -> pd.DataFrame:
df = pd.read_sql(params.query, con=self.engine)
query = params.query
matches = re.findall(r"{{[\s+]*\w+[\s+]*}}", query)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a constant up top and document it with comment indicating what it does.

@Fanjia-Yan Fanjia-Yan merged commit 7f75de0 into main Jun 16, 2022
@vsreekanti vsreekanti deleted the eng-1086-create-syntax-sugar-for-sql-queries branch June 21, 2022 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants