Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support OrderBy and Sort in Expr->String #10256

Closed
Tracked by #9726
yyy1000 opened this issue Apr 26, 2024 · 8 comments · Fixed by #10370
Closed
Tracked by #9726

Support OrderBy and Sort in Expr->String #10256

yyy1000 opened this issue Apr 26, 2024 · 8 comments · Fixed by #10370
Labels
enhancement New feature or request

Comments

@yyy1000
Copy link
Contributor

yyy1000 commented Apr 26, 2024

Is your feature request related to a problem or challenge?

The current implementation in #9936 for Sort is probably not correct.
It ignores asc and null_first field.
IMO, the better struct for Sort and order_by is OrderByExpr
https://github.com/sqlparser-rs/sqlparser-rs/blob/deaa6d8151c8f90a7e9cbb624876fe0e8b8a165d/src/ast/query.rs#L1444-L1450
However, OrderByExpr is not an enum item in Expr

Describe the solution you'd like

two steps:

  1. make OrderByExpr an enum in Expr (change in sql-parser)
  2. make Sort return a OrderByExpr during Expr->String

Describe alternatives you've considered

No response

Additional context

No response

@yyy1000 yyy1000 added the enhancement New feature or request label Apr 26, 2024
@yyy1000
Copy link
Contributor Author

yyy1000 commented Apr 26, 2024

@kevinmingtarja @alamb What do you think? 🤔

@alamb
Copy link
Contributor

alamb commented Apr 27, 2024

I don't think we should change SQL parser

In general, ORDER BY ... isn't really a general Expr (it can't appear in arbitrary locations in SQL, only specific ones)

For example, this isn't valid;

SELECT x ORDER BY y FROM foo;

ORDER BY can appear in certain places like

SELECT x FROM foo ORDER BY y
SELECT agg(x ORDER BY y) FROM foo;
SELECT agg(..) OVER (ORDER BY y) FROM foo

I think we should special case finding Expr::Sort exprs in those locations.

Maybe the code would be clearer if we made the conversion code return an Error if it encountered a SortExpr in an unexpected location 🤔

@kevinmingtarja
Copy link
Contributor

Yeah I was kinda wondering about this too last time (#9726 (comment)).

I was thinking we don't change the SQL parser too.

Instead, I think given that the "sources" of the conversion to Expr is not only ast::Expr objects, having its return type be only ast::Expr is perhaps too restrictive? As we've seen in this case. If I understand correctly, I think another case is Expr::AggregateFunction, where we can't really convert it back to ast::Function as it's also not an enum item in ast::Expr.

So what if we extend the return type of expr_to_sql to more than ast::Expr? I think it makes sense to extend it to all the "sources" of conversions from ast::* to Expr.

For example in here, we know that we are converting a ast::OrderByExpr to a Expr::Sort, so I think it makes sense that ast::OrderByExpr should also be a valid return type for expr_to_sql, as that is really the "source" data model of Expr::Sort.

impl<'a, S: ContextProvider> SqlToRel<'a, S> {
/// Convert sql [OrderByExpr] to `Vec<Expr>`.
///
/// If `literal_to_column` is true, treat any numeric literals (e.g. `2`) as a 1 based index
/// into the SELECT list (e.g. `SELECT a, b FROM table ORDER BY 2`).
/// If false, interpret numeric literals as constant values.
pub(crate) fn order_by_to_sort_expr(
&self,
exprs: &[OrderByExpr],
schema: &DFSchema,
planner_context: &mut PlannerContext,
literal_to_column: bool,
) -> Result<Vec<Expr>> {

@yyy1000
Copy link
Contributor Author

yyy1000 commented Apr 27, 2024

Thank you all for your reply!

when turning the expression back into an entire query

Yeah, I was just a little curious why for Expr::Sort we ignore the ORDER BY information. It seems that for the entire query, it could address the Sort.

LogicalPlan::Sort(sort) => {
query.order_by(self.sort_to_sql(sort.expr.clone())?);
self.select_to_sql_recursively(
sort.input.as_ref(),
query,
select,
relation,
)
}

Maybe the code would be clearer if we made the conversion code return an Error if it encountered a SortExpr in an unexpected location 🤔

Yeah, so if we encountered a SortExpr in expr_to_sql function, does that mean unexpected location ? I think for a entire query case, the code will probably never reach that.
I know a use case would be just convert a Expr to sql. But col("a").sort(true, true) will be only "a", which doesn't match the sql semantic? 🤔

If I understand correctly, I think another case is Expr::AggregateFunction, where we can't really convert it back to [ast::Function]
(https://docs.rs/sqlparser/latest/sqlparser/ast/struct.Function.html#) as it's also not an enum item in ast::Expr.

ast::Function is an enum item in ast::Expr, see
https://github.com/sqlparser-rs/sqlparser-rs/blob/0b5722afbfb60c3e3a354bc7c7dc02cb7803b807/src/ast/mod.rs#L683-L684
So maybe the return type of expr_to_sql being ast::Expr is enough, given the Sort will be addressed in a special way?

@alamb
Copy link
Contributor

alamb commented Apr 30, 2024

Yeah, so if we encountered a SortExpr in expr_to_sql function, does that mean unexpected location ? I think for a entire query case, the code will probably never reach that.

I think so

But col("a").sort(true, true) will be only "a", which doesn't match the sql semantic? 🤔

Creating SQL from programatically created Exprs is an interesting usecase.

it seems like the issue is that Expr in datafusion can come from both ast::Expr and ast::OrderByExpr

Currently we have

pub fn expr_to_sql(expr: &Expr) -> Result<ast::Expr> {
let unparser = Unparser::default();
unparser.expr_to_sql(expr)
}

Maybe we could change the signature to something like

/// DataFusion's Exprs can represent either an `Expr` or an `OrderByExpr`
enum Unparsed {
  // SQL Expression
  Expr(ast::Expr),
  // SQL ORDER BY expression (e.g. `col ASC NULLS FIRST`)
  OrderByExpr(ast::OrderByExpr),
}
 
pub fn expr_to_sql(expr: &Expr) -> Result<Unparsed> { 
...
 } 

@yyy1000
Copy link
Contributor Author

yyy1000 commented May 3, 2024

Good, I can try to implement it. :)
Another concern for me is changing the function signature to pub fn expr_to_sql(expr: &Expr) -> Result<Unparsed> would lead every call of expr_to_sql a convert to ast::Expr in the original code, would this be a good choice? 🤔

For example, There would be something like

let l = self.expr_to_sql(left.as_ref())?;
let expr = match l {
    Unparsed::Expr(expr) => expr,
    Unparsed::OrderByExpr(_) => internal_err(),
};

@alamb
Copy link
Contributor

alamb commented May 3, 2024

Maybe we could make the API easier to use like:

/// DataFusion's Exprs can represent either an `Expr` or an `OrderByExpr`
enum Unparsed {
  // SQL Expression
  Expr(ast::Expr),
  // SQL ORDER BY expression (e.g. `col ASC NULLS FIRST`)
  OrderByExpr(ast::OrderByExpr),
}
 
/// Throws an error if Expr can not be represented by an `ast::Expr` 
pub fn expr_to_sql(expr: &Expr) -> Result<sql::ast::Expr> { 
...
 } 

/// Returns an `Unparsed` based on the expression
pub fn expr_to_unparsed(expr: &Expr) -> Result<Unparsed> { 
...
 } 

And that way you example would be

let expr = self.expr_to_sql(left.as_ref())?;

In places where order by may be possible then the user could call the self.expr_to_unparsed 🤔

@yyy1000
Copy link
Contributor Author

yyy1000 commented May 3, 2024

That looks better! Will try to implement it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants