Skip to content

Support distinct count aggregation#1

Closed
chloe-zh wants to merge 12 commits intochloe-zh:issue/#100from
opensearch-project:main
Closed

Support distinct count aggregation#1
chloe-zh wants to merge 12 commits intochloe-zh:issue/#100from
opensearch-project:main

Conversation

@chloe-zh
Copy link
Owner

Signed-off-by: Chloe Zhang chloezh1102@gmail.com

Transferred from PR opensearch-project#116

Description

  • Grammar: enabled distinct count
    SQL: count(DISTINCT field)
    PPL: distinct_count(field)/dc(field) in stats command
    Note that distinct all count(distinct *) is not supported, the grammar of MySQL to get the distinct count of all fields is to put all fields in the distinct count field: https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_count-distinct Not supported in this PR since the distinct count of multi-field is not implemented here. Will create another issue for this case.

  • Core engine
    Added distinct option in aggregators, and the distinct option is off by default, distinct count turns on this option.

  • Push down
    Distinct count of single field in OpenSearch is achieved by building the cardinality aggregation, for example:

select count(distinct Dest) from opensearch_dashboards_sample_data_flights

DSL:
{
  "from":0,
  "size":0,
  "timeout":"1m",
  "aggregations":{
    "count(distinct Dest)":{
      "cardinality":{
        "field":"Dest"
      }
    }
  }
}

Full explain:
{
  "root": {
    "name": "ProjectOperator",
    "description": {
      "fields": "[count(distinct Dest)]"
    },
    "children": [
      {
        "name": "OpenSearchIndexScan",
        "description": {
          "request": """OpenSearchQueryRequest(indexName=opensearch_dashboards_sample_data_flights, sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"count(distinct Dest)":{"cardinality":{"field":"Dest"}}}}, searchDone=false)"""
        },
        "children": []
      }
    ]
  }
}

Explain of distinct count with filter:

SELECT COUNT(distinct OriginWeather) filter(where AvgTicketPrice < 500) FROM opensearch_dashboards_sample_data_flights

{
  "root": {
    "name": "ProjectOperator",
    "description": {
      "fields": """[COUNT(distinct OriginWeather) filter(where AvgTicketPrice < 500)]"""
    },
    "children": [
      {
        "name": "OpenSearchIndexScan",
        "description": {
          "request": """OpenSearchQueryRequest(indexName=opensearch_dashboards_sample_data_flights, sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"COUNT(distinct OriginWeather) filter(where AvgTicketPrice < 500)":{"filter":{"range":{"AvgTicketPrice":{"from":null,"to":500,"include_lower":true,"include_upper":false,"boost":1.0}}},"aggregations":{"COUNT(distinct OriginWeather) filter(where AvgTicketPrice < 500)":{"cardinality":{"field":"OriginWeather"}}}}}}, searchDone=false)"""
        },
        "children": []
      }
    ]
  }
}

SQL example:

SELECT COUNT(distinct OriginWeather) filter(where AvgTicketPrice < 500) FROM opensearch_dashboards_sample_data_flights

{
  "schema": [
    {
      "name": """COUNT(distinct OriginWeather) filter(where AvgTicketPrice < 500)""",
      "type": "integer"
    }
  ],
  "datarows": [
    [
      8
    ]
  ],
  "total": 1,
  "size": 1,
  "status": 200
}

PPL example:

source=opensearch_dashboards_sample_data_flights | stats distinct_count(Dest) by Origin | head 3

{
  "schema": [
    {
      "name": "distinct_count(Dest)",
      "type": "integer"
    },
    {
      "name": "Origin",
      "type": "string"
    }
  ],
  "datarows": [
    [
      72,
      "Abu Dhabi International Airport"
    ],
    [
      78,
      "Adelaide International Airport"
    ],
    [
      72,
      "Adolfo Suarez Madrid— Barajas Airport"
    ]
  ],
  "total": 3,
  "size": 3
}

davidcui1225 and others added 12 commits June 7, 2021 09:58
Signed-off-by: David Cui <davidcui@amazon.com>
Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>
Bumps [glob-parent](https://github.com/gulpjs/glob-parent) from 5.1.1 to 5.1.2.
- [Release notes](https://github.com/gulpjs/glob-parent/releases)
- [Changelog](https://github.com/gulpjs/glob-parent/blob/main/CHANGELOG.md)
- [Commits](gulpjs/glob-parent@v5.1.1...v5.1.2)

---
updated-dependencies:
- dependency-name: glob-parent
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>
* Support construct AggregationResponseParser during Aggregator build stage (#108)

* Support construct AggregationResponseParser during Aggregator build stage

* modify the doc

Signed-off-by: penghuo <penghuo@gmail.com>

* Impl stddev and variance function in SQL and PPL (#115)

* impl variance frontend and backend

* Support construct AggregationResponseParser during Aggregator build stage

* add var and varp for PPL

Signed-off-by: penghuo <penghuo@gmail.com>

* add UT

Signed-off-by: penghuo <penghuo@gmail.com>

* fix UT

Signed-off-by: penghuo <penghuo@gmail.com>

* fix doc format

Signed-off-by: penghuo <penghuo@gmail.com>

* fix doc format

Signed-off-by: penghuo <penghuo@gmail.com>

* fix the doc

Signed-off-by: penghuo <penghuo@gmail.com>

* add stddev_samp and stddev_pop

Signed-off-by: penghuo <penghuo@gmail.com>

* fix UT coverage

* address comments

Signed-off-by: penghuo <penghuo@gmail.com>

* Fix the aggregation filter missing in named aggregators (#123)

* Take the condition expression as property to the named aggregator when wrapping the delegated aggregator

Signed-off-by: chloe-zh <chloezh1102@gmail.com>

* update

Signed-off-by: chloe-zh <chloezh1102@gmail.com>

* Added test case where filtered agg is not pushed down

Signed-off-by: chloe-zh <chloezh1102@gmail.com>

* update

Signed-off-by: chloe-zh <chloezh1102@gmail.com>

* update

Signed-off-by: chloe-zh <chloezh1102@gmail.com>

* Rename gradle tasks in the manner of Opensearch (#133)

Signed-off-by: Zhongnan Su <szhongna@amazon.com>

* add testing-library/user-event to pass CI (#141)

Signed-off-by: David Cui <davidcui@amazon.com>

* Deprecated legacy settings instead of removing (#140)

* Deprecated cursor enabling and fetch size setting

Signed-off-by: penghuo <penghuo@gmail.com>

* update doc

Signed-off-by: penghuo <penghuo@gmail.com>

* add more deprecated settings

Signed-off-by: penghuo <penghuo@gmail.com>

* Add Codecov to SQL  (#138)

* add codecov to CI workflow

Signed-off-by: David Cui <davidcui@amazon.com>

* add codecov token
Signed-off-by: David Cui <davidcui@amazon.com>

* Add Docs and link checker workflow (#139)

* save progress on doc renaming

* update rest of documentation links
Signed-off-by: David Cui <davidcui@amazon.com>

* add workbench developer guide and update readme
Signed-off-by: David Cui <davidcui@amazon.com>

* Bump glob-parent from 5.1.1 to 5.1.2 in /workbench

Bumps [glob-parent](https://github.com/gulpjs/glob-parent) from 5.1.1 to 5.1.2.
- [Release notes](https://github.com/gulpjs/glob-parent/releases)
- [Changelog](https://github.com/gulpjs/glob-parent/blob/main/CHANGELOG.md)
- [Commits](gulpjs/glob-parent@v5.1.1...v5.1.2)

---
updated-dependencies:
- dependency-name: glob-parent
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* add testing-library
Signed-off-by: David Cui <davidcui@amazon.com>

* rename workbench artifact name to kebab case
Signed-off-by: David Cui <davidcui@amazon.com>

* standardize maintainers list based on team roles, fix workbench name in dev guide
Signed-off-by: David Cui <davidcui@amazon.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* add links to exclude from link checker
Signed-off-by: David Cui <davidcui@amazon.com>

* fix localhost exclusion flag
Signed-off-by: David Cui <davidcui@amazon.com>

* add all exclusion links, revert localhost exclusion away from regex
Signed-off-by: David Cui <davidcui@amazon.com>

* fix family.zzz exclusion, remove stale link from odbc sign_installers
Signed-off-by: David Cui <davidcui@amazon.com>

* fix lychee workflow after local testing and replace old sign_installers url
Signed-off-by: David Cui <davidcui@amazon.com>

* remove accidental duplicate exclude flag
Signed-off-by: David Cui <davidcui@amazon.com>

* remove sign installers as it is out of date
Signed-off-by: David Cui <davidcui@amazon.com>

Co-authored-by: Peng Huo <penghuo@gmail.com>
Co-authored-by: Chloe <chloezh1102@gmail.com>
Co-authored-by: Zhongnan Su <szhongna@amazon.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
…146)

* Bump OpenSearch version from rc1 to 1.0.0

Signed-off-by: Chen Dai <daichen@amazon.com>

* Rename JDBC artifact by removing -rc1

Signed-off-by: Chen Dai <daichen@amazon.com>

* Remove rc1 qualifier in build workflow

Signed-off-by: Chen Dai <daichen@amazon.com>

* Remove rc1 from build tools version

Signed-off-by: Chen Dai <daichen@amazon.com>

* Fix IT failure

Signed-off-by: Chen Dai <daichen@amazon.com>

* Rollback build tools to rc1 due to known issue

Signed-off-by: Chen Dai <daichen@amazon.com>

* Bump CLI version

Signed-off-by: Chen Dai <daichen@amazon.com>

* Bump query workbench version

Signed-off-by: Chen Dai <daichen@amazon.com>

* Build against 1.0.0

Signed-off-by: Chen Dai <daichen@amazon.com>

* Update release notes drafter

Signed-off-by: Chen Dai <daichen@amazon.com>

* Update nodejs to 10.24.1

Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
* Add release notes

Signed-off-by: Chen Dai <daichen@amazon.com>

* Change release date

Signed-off-by: Chen Dai <daichen@amazon.com>

* Add bug fixes section

Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>
…demo

 Add Integtest.sh for OpenSearch integtest setups (workbench)
@chloe-zh chloe-zh closed this Jul 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants