Eng 1086 create syntax sugar for sql queries #103

Fanjia-Yan · 2022-06-14T22:51:04Z

This PR is the first attempt to create syntax sugar for sql queries, the implementation is the following:

In Aqueduct_executor, for all database connectors that take in SQL syntax, we process the query when we want to extract a dataframe from the database. If there exist "{{today}}" in the SQL syntax, we will replace the substring with today'date.
I have created a test under sql_integration_test.py. I search on the dataset hotel_reviews based on review_date = {{today}}. It should return an empty dataframe. However, this test is not conclusive since there are many input that can result in empty dataframe.

Would appreciate some suggestions~!

kenxu95 · 2022-06-15T19:01:17Z

integration_tests/sdk/sql_integration_test.py

+    sql_artifact_today = db.sql(query="select * from hotel_reviews where review_date = {{today}}")
+    assert sql_artifact_today.get().empty
+    sql_artifact_not_today = db.sql(
+        query="select * from hotel_reviews where review_date != {{today}}"


since this is historical data, can we just do < today? Not sure if thats valid syntax

kenxu95 · 2022-06-15T19:01:53Z

integration_tests/sdk/sql_integration_test.py

@@ -37,3 +37,13 @@ def test_invalid_destination_integration(client):
        output_artifact.save(
            config=db.config(table=generate_table_name(), update_mode=LoadUpdateMode.REPLACE)
        )
+
+
+def test_sugar_syntax_sql(client):


lets be more specific with the test name. test_sql_today_tag

kenxu95 · 2022-06-15T19:05:38Z

src/python/aqueduct_executor/operators/connectors/tabular/relational.py

@@ -24,7 +25,12 @@ def discover(self) -> List[str]:
        return inspect(self.engine).get_table_names()

    def extract(self, params: extract.RelationalParams) -> pd.DataFrame:
-        df = pd.read_sql(params.query, con=self.engine)
+        query = params.query
+        if "{{today}}" in query:


I think variation like {{ today }} and {{today }} should also be valid. Let's look for whats in between {{ and }} and strip out the whitespace. Then we take all the arguments extracted in this fashion and try to match it against a predefined map from [arg name] to func() -> str which replaces that string.

This will let us extend this easily to other sql arguments besides today.

There might be better way of doing this, but you can just keep track of the indexes in the string that the argument starts and ends at, eg: [(5, 10), (15, 20), (36, 55)], then perform the string replacements in reverse.

The point is that we will want to support in the future 1) other types of sql arguments and 2) multiple sql arguments in a single statement

Yeah, I think that'll be ideal. I am trying to implement a version using RegEx so it extracts all the curly brace.

Fanjia-Yan · 2022-06-15T22:03:38Z

Make change as follow:

For the integration test, we check "<" instead of "!=", this will make sure that today is a calendar date for comparison
For the syntax sugar. I create a RegEx which will identify anything that matches {{[space*][tag][space*]}}, * means optional. This will help us extract all the tag, which will also be used when there are parameters. After that I strip the space and curly brace and compare to our tag. If there is a tag, we will process the tag and replace the sql query.

kenxu95 · 2022-06-16T17:25:41Z

src/python/aqueduct_executor/operators/connectors/tabular/relational.py

@@ -24,7 +26,15 @@ def discover(self) -> List[str]:
        return inspect(self.engine).get_table_names()

    def extract(self, params: extract.RelationalParams) -> pd.DataFrame:
-        df = pd.read_sql(params.query, con=self.engine)
+        query = params.query
+        matches = re.findall(r"{{[\s+]*\w+[\s+]*}}", query)


Can we make this a constant up top and document it with comment indicating what it does.

Ubuntu added 3 commits June 14, 2022 21:34

first commit

7b0b7bb

integration test

a9d2585

test

2651e2a

Fanjia-Yan requested review from kenxu95 and cw75 June 14, 2022 22:51

style

7d05b42

kenxu95 requested changes Jun 15, 2022

View reviewed changes

Ubuntu added 2 commits June 15, 2022 21:27

update on test and enable regex filtering out tag

685fb46

stlye

0fbc3c2

change variable name to tag

b2e2f14

Fanjia-Yan requested a review from kenxu95 June 15, 2022 22:05

kenxu95 approved these changes Jun 16, 2022

View reviewed changes

Ubuntu added 3 commits June 16, 2022 18:39

make regex constant and add doc

fd6bebc

style fix

e855775

style fix

1c961c3

Fanjia-Yan merged commit 7f75de0 into main Jun 16, 2022

vsreekanti deleted the eng-1086-create-syntax-sugar-for-sql-queries branch June 21, 2022 21:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eng 1086 create syntax sugar for sql queries #103

Eng 1086 create syntax sugar for sql queries #103

Fanjia-Yan commented Jun 14, 2022

kenxu95 Jun 15, 2022

kenxu95 Jun 15, 2022

kenxu95 Jun 15, 2022

kenxu95 Jun 15, 2022

Fanjia-Yan Jun 15, 2022

Fanjia-Yan commented Jun 15, 2022

kenxu95 Jun 16, 2022

Eng 1086 create syntax sugar for sql queries #103

Eng 1086 create syntax sugar for sql queries #103

Conversation

Fanjia-Yan commented Jun 14, 2022

kenxu95 Jun 15, 2022

Choose a reason for hiding this comment

kenxu95 Jun 15, 2022

Choose a reason for hiding this comment

kenxu95 Jun 15, 2022

Choose a reason for hiding this comment

kenxu95 Jun 15, 2022

Choose a reason for hiding this comment

Fanjia-Yan Jun 15, 2022

Choose a reason for hiding this comment

Fanjia-Yan commented Jun 15, 2022

kenxu95 Jun 16, 2022

Choose a reason for hiding this comment