Skip to content

Conversation

@beliefer
Copy link
Contributor

@beliefer beliefer commented Apr 28, 2022

What changes were proposed in this pull request?

Currently, Offset must work with Limit. The behavior not allow to use offset alone and add offset API into DataFrame.

If we use Offset alone, there are two situations:

  1. If Offset is the last operator, collect the result to the driver and then drop/skip the first n (offset value) rows. Users can test or debug Offset in the way.
  2. If Offset is the intermediate operator, shuffle all the result to one task and drop/skip the first n (offset value) rows and the the result will be passed to the downstream operator. If offset value is bigger, the overhead looks not good, so we not supports the situation.

For example, SELECT * FROM a offset 10; parsed to the logic plan as below:

Offset (offset = 10) // Only offset clause
|--Relation

and then the physical plan as below:

CollectOffsetExec (offset = 10) // Collect the result to the driver and skip the first 10 rows
|--JDBCRelation

After this PR merged, users could input the SQL show below:

SELECT '' AS ten, unique1, unique2, stringu1
 		FROM onek
 		ORDER BY unique1 OFFSET 990;

Why are the changes needed?

Let Offset could work without Limit.

Does this PR introduce any user-facing change?

'No'.
New feature.

How was this patch tested?

New tests.

@github-actions github-actions bot added the SQL label Apr 28, 2022
@HyukjinKwon
Copy link
Member

cc @dtenedor FYI

@beliefer
Copy link
Contributor Author

ping @dtenedor cc @cloud-fan

@cloud-fan
Copy link
Contributor

how about we only support Offset as the last operator? I think that's the most useful case and can unblock the dataframe API.

@beliefer
Copy link
Contributor Author

how about we only support Offset as the last operator? I think that's the most useful case and can unblock the dataframe API.

OK. It seems we just let users use Offset alone in test or debug, so we avoid the overhead on product environment.

@dtenedor
Copy link
Contributor

@beliefer thanks for adding this! Please ping us when Wenchen's suggestion is implemented and we can take another look. Otherwise this PR LGTM.

@beliefer
Copy link
Contributor Author

Because #36417 merged

@beliefer beliefer closed this May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants