This repository has been archived by the owner on Aug 2, 2022. It is now read-only.
Support NULL and MISSING value in response #667
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of changes
Problem Statements
Background
Before explain the current issue, firstly, let’s setup the context.
Sample Data
Let’s explain the problem with an example, the bank index which has 2 fields.
Then we add some data to the index.
JDBC and JSON data format
Then, we define the response data format for query “
SELECT account_number FROM bank
” as follows.Issue 1. Represent NULL and MISSING in Response
With these sample data and response data format in mind, let go through more query and examine their results.
Considering the query:** SELECT age, account_number FROM bank. **
The JDBC format doesn’t have MISSING value. If the field exist in the schema but missed in the document, it should considered as NULL value.
The JSON format could represent the MISSING value properly.
Issue 2. ExprValue to JDBC format
Based on our current impementation, all the SQL operator is translated to chain of PhysicalOpeartor. Each PhysicalOpeartor provide the ExprValue as the return value. The protocol pull the result from PhysicalOperator and transalte to the expected format. e.g. Taking the above query as example, the result of the PhysicalOpeartor is a list of ExprValues.
The current solution is extract field name and field type from the data itself. This solution has two problems
Issue 3. The Type info is missing
In current design, the Protocol is a seperate module which work independently with QueryEngine. The Protocol module receive the list of ExprTupleValue from QueryEngine, then the Protocol module format the result based on the type of ExprValue. the problem is ExprNullValue and ExprMissingValue doesn’t have type assosicate with it. thus the Protocol module can’t derive the type info from input ExprTupleValue directly.
Issue 4. What is *(all field) means in SELECT
In current design, the SELECT * clause ingored in the AST builder logic, because it means select all the data from input operator. The issue is similar as Issue 3 that if the input operator produce NULL or MISSING value, then the Protocol have no idea to derive type info from it.
Requirements
Solution
Includ NULL and MISSING value in the QueryResult (Issue 1, 2)
The SELECT operator will be translated to PhysicalOpeartor with a list of expression to resolve ExprValue from input data. With the above example, when handling NULL and MISSING value, the expected output data should be as follows.
An aditionial list of Schema is also required to when protocol is JDBC.
Then the protocol module could easily translate the JDBC format or JSON format.
Expend SELECT * to SELECT ...fields (Issue 4)
In our current implementation, in SQL, the SELECT * is ignored and in PPL there even no fields * command. This solution works fine for JSON format which doesn’t require schema, but it doens’s works for JDBC format.
The proposal in here is
Automatically add fields * to PPL query
Comparing with SQL, the PPL grammer doesn’t require the Fields command is the last command. Thus, the fields * command should been automatically added.
The automatically added logic is if the last operator is not Fields command, the Fields * command will been added.
Expand SELECT * to SELECT ...fields
In Analyzer, we should expend the * to all fields in the current scope. There are two issues we need to address,
Retrive Type Info from ProjectOperator and Expose to Protocol (Issue 3)
After expending the * and automatically add fields, the type info could been retrived from ProjectOperator. Then the Protocol could get schema and data from QueryEngine.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.