Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] support distinct in analytic functions #36878

Open
2 of 3 tasks
morrySnow opened this issue Jun 26, 2024 · 4 comments
Open
2 of 3 tasks

[Enhancement] support distinct in analytic functions #36878

morrySnow opened this issue Jun 26, 2024 · 4 comments

Comments

@morrySnow
Copy link
Contributor

Search before asking

  • I had searched in the issues and found no similar issues.

Description

create table t(id int, c1 int, c2 double, c3 string) properties('replication_num'='1');

support distinct in analytic functions: count, sum and avg

select c1, c2, avg( distinct c1) over(partition by c2) from t;

Solution

  1. remove restirct of distinct in analytic functions
    1.1.
    if (ctx.windowSpec() != null) {
    if (isDistinct) {
    throw new ParseException("DISTINCT not allowed in analytic function: " + functionName, ctx);
    }
    return withWindowSpec(ctx.windowSpec(), new Count());
    }

    1.2.
    if (ctx.windowSpec() != null) {
    if (isDistinct) {
    throw new ParseException("DISTINCT not allowed in analytic function: " + functionName, ctx);
    }
    return withWindowSpec(ctx.windowSpec(), function);
    }
  2. convert functions to distinct one in ExtractAndNormalizeWindowExpression
    2.1. count(distinct c1) over(partition c2) to multi_distinct_count(c1) over(partition c2)
    2.2. sum(distinct c1) over(partition c2) to multi_distinct_sum(c1) over(partition c2)
    2.3. avg(distinct c1) over(partition c2) to cast(multi_distinct_sum(c1) over(partition c2) as double) / cast(multi_distinct_count(c1) over(partition c2) as double)

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@cjj2010
Copy link
Contributor

cjj2010 commented Jun 27, 2024

I want to try 2.1

@morrySnow
Copy link
Contributor Author

@cjj2010 This issue cannot be split into smaller tasks for solving. Can you handle the entire issue?

@cjj2010
Copy link
Contributor

cjj2010 commented Jun 29, 2024

@cjj2010 This issue cannot be split into smaller tasks for solving. Can you handle the entire issue?

Okay, I am willing to handle the entire issue. The business scenario I am currently facing requires this feature very much

@morrySnow
Copy link
Contributor Author

@cjj2010 Great! I will assign this issue to you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants