Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Mar 16, 2021

The sort_limit kernel was added by @sundy-li in #9602

While writing some doc examples in #9721, it occured to me we could potentially simplify the API so I figured I would offer a proposed PR for comment

Rationale

Since we already have a SortOptions structure that controls sorting options, we could also add the limit to that structure rather than adding a new sort_limit function and still avoid changing the API

Changes

Move the limit option to SortOptions

/// If Some(limit), only first `limit` elements in the sort order
/// in the output. Any data data after the limit will be
/// discarded.
pub limit: Option<usize>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the proposal

assert_eq!(output, expected)
}

// TODO remove this function
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rewrite the tests properly (with a single SortOptions parameter) if people like this API change, but I didn't bother until I know it is worthwhile

@alamb
Copy link
Contributor Author

alamb commented Mar 16, 2021

@Dandandan I wonder if you have any comments on this approach

@jorgecarleitao
Copy link
Member

jorgecarleitao commented Mar 16, 2021

I think that SortOptions is used on a per-column basis; it corresponds to how each column should be ordered (nulls first, descending). limit reflects how many entries should be outputted by sort, which IMO is column-independent.

This is probably more evident in lex_sort, where a SortOption per column is passed.

@alamb
Copy link
Contributor Author

alamb commented Mar 16, 2021

I think that SortOptions is used on a per-column basis; it corresponds to how each column should be ordered (nulls first, descending). limit reflects how many entries should be outputted by sort, which IMO is column-independent.

@jorgecarleitao that is a good point. Since the Arrow sort kernel only operates on a single column, I think the distinction is not relevant.

However, since lexsort and DataFusion sort use the same SortOptions struct for each coulmn, that is a good reason not to add a limit option to them (as then it could be inconsistent across columns, potentially)

@sundy-li
Copy link
Contributor

sundy-li commented Mar 16, 2021

I think that SortOptions is used on a per-column basis; it corresponds to how each column should be ordered (nulls first, descending). limit reflects how many entries should be outputted by sort, which IMO is column-independent.

Totally agree with that.

select * from table order by a ASC, b DESC limit 10

There are 2 sort_options, yet the limit is the only one for the global results.

@github-actions
Copy link

@alamb
Copy link
Contributor Author

alamb commented Mar 16, 2021

I am convinced that this PR is not a good idea at this time. Closing. Thanks for the feedback @sundy-li and @jorgecarleitao 👍

@alamb alamb closed this Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants