Skip to content

Conversation

@chloe-zh
Copy link
Contributor

@chloe-zh chloe-zh commented Jun 9, 2021

Signed-off-by: Chloe Zhang [email protected]

Description

  • Grammar: enabled distinct count
    SQL: count(DISTINCT field)
    PPL: distinct_count(field)/dc(field) in stats command
    Note that distinct all count(distinct *) is not supported, the grammar of MySQL to get the distinct count of all fields is to put all fields in the distinct count field: https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_count-distinct Not supported in this PR since the distinct count of multi-field is not implemented here. Will create another issue for this case.

  • Core engine
    Added distinct option in aggregators, and the distinct option is off by default, distinct count turns on this option.

  • Push down
    Distinct count of single field in OpenSearch is achieved by building the cardinality aggregation, for example:

select count(distinct Dest) from opensearch_dashboards_sample_data_flights

DSL:
{
  "from":0,
  "size":0,
  "timeout":"1m",
  "aggregations":{
    "count(distinct Dest)":{
      "cardinality":{
        "field":"Dest"
      }
    }
  }
}

Full explain:
{
  "root": {
    "name": "ProjectOperator",
    "description": {
      "fields": "[count(distinct Dest)]"
    },
    "children": [
      {
        "name": "OpenSearchIndexScan",
        "description": {
          "request": """OpenSearchQueryRequest(indexName=opensearch_dashboards_sample_data_flights, sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"count(distinct Dest)":{"cardinality":{"field":"Dest"}}}}, searchDone=false)"""
        },
        "children": []
      }
    ]
  }
}

Explain of distinct count with filter:

SELECT COUNT(distinct OriginWeather) filter(where AvgTicketPrice < 500) FROM opensearch_dashboards_sample_data_flights

{
  "root": {
    "name": "ProjectOperator",
    "description": {
      "fields": """[COUNT(distinct OriginWeather) filter(where AvgTicketPrice < 500)]"""
    },
    "children": [
      {
        "name": "OpenSearchIndexScan",
        "description": {
          "request": """OpenSearchQueryRequest(indexName=opensearch_dashboards_sample_data_flights, sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"COUNT(distinct OriginWeather) filter(where AvgTicketPrice < 500)":{"filter":{"range":{"AvgTicketPrice":{"from":null,"to":500,"include_lower":true,"include_upper":false,"boost":1.0}}},"aggregations":{"COUNT(distinct OriginWeather) filter(where AvgTicketPrice < 500)":{"cardinality":{"field":"OriginWeather"}}}}}}, searchDone=false)"""
        },
        "children": []
      }
    ]
  }
}

SQL example:

SELECT COUNT(distinct OriginWeather) filter(where AvgTicketPrice < 500) FROM opensearch_dashboards_sample_data_flights

{
  "schema": [
    {
      "name": """COUNT(distinct OriginWeather) filter(where AvgTicketPrice < 500)""",
      "type": "integer"
    }
  ],
  "datarows": [
    [
      8
    ]
  ],
  "total": 1,
  "size": 1,
  "status": 200
}

PPL example:

source=opensearch_dashboards_sample_data_flights | stats distinct_count(Dest) by Origin | head 3

{
  "schema": [
    {
      "name": "distinct_count(Dest)",
      "type": "integer"
    },
    {
      "name": "Origin",
      "type": "string"
    }
  ],
  "datarows": [
    [
      72,
      "Abu Dhabi International Airport"
    ],
    [
      78,
      "Adelaide International Airport"
    ],
    [
      72,
      "Adolfo Suarez Madrid— Barajas Airport"
    ]
  ],
  "total": 3,
  "size": 3
}

@chloe-zh chloe-zh self-assigned this Jun 9, 2021
@chloe-zh chloe-zh added SQL Priority-High enhancement New feature or request PPL Piped processing language labels Jun 9, 2021
chloe-zh added 5 commits June 9, 2021 12:22
Signed-off-by: chloe-zh <[email protected]>
…-project#100

Signed-off-by: chloe-zh <[email protected]>

# Conflicts:
#	opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/aggregation/dsl/MetricAggregationBuilder.java
Signed-off-by: chloe-zh <[email protected]>
Signed-off-by: chloe-zh <[email protected]>
@chloe-zh chloe-zh marked this pull request as ready for review June 9, 2021 20:33
@chloe-zh chloe-zh requested review from dai-chen and penghuo June 9, 2021 20:33
chloe-zh added 10 commits June 10, 2021 21:09
Signed-off-by: chloe-zh <[email protected]>
…-project#100

Signed-off-by: chloe-zh <[email protected]>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/expression/DSL.java
#	core/src/main/java/org/opensearch/sql/expression/aggregation/NamedAggregator.java
#	core/src/test/java/org/opensearch/sql/analysis/ExpressionAnalyzerTest.java
#	docs/user/dql/aggregations.rst
#	integ-test/src/test/resources/correctness/queries/aggregation.txt
#	opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/aggregation/dsl/MetricAggregationBuilder.java
#	opensearch/src/test/java/org/opensearch/sql/opensearch/storage/script/aggregation/dsl/MetricAggregationBuilderTest.java
#	sql/src/test/java/org/opensearch/sql/sql/parser/AstExpressionBuilderTest.java
Signed-off-by: chloe-zh <[email protected]>
chloe-zh added 2 commits June 16, 2021 14:05
Signed-off-by: chloe-zh <[email protected]>
Signed-off-by: chloe-zh <[email protected]>
@chloe-zh chloe-zh requested review from dai-chen and penghuo June 16, 2021 22:39
Signed-off-by: chloe-zh <[email protected]>
@davidcui1225 davidcui1225 deleted the branch opensearch-project:develop June 26, 2021 00:47
Yury-Fridlyand added a commit that referenced this pull request Nov 1, 2022
…#116)

Co-authored-by: MaxKsyunz <[email protected]>
Co-authored-by: forestmvey <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request PPL Piped processing language SQL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants