Skip to content

Conversation

@mkleen
Copy link
Contributor

@mkleen mkleen commented Oct 27, 2025

Which issue does this PR close?

Rationale for this change

The queries from the original ticket fail, because an unprojected generate_series function would produce in a join the wrong number of columns which leads to a runtime error.

What changes are included in this PR?

This adds a missing projection to generate_series to ensure values are only emitted when projected.

Are these changes tested?

I added a sql-logic test. I also compared the results against Postgres and DuckDB:

Postgres:

mkleen=# SELECT v1 FROM (select generate_series as v1 from generate_series(1, 3)) g1, (select generate_series as v2 from generate_series(1, 3)) g2;
 v1
----
  1
  1
  1
  2
  2
  2
  3
  3
  3
(9 rows)

DuckDB:

D SELECT v1 FROM (select generate_series as v1 from generate_series(1, 3)) g1, (select generate_series as v2 from generate_series(1, 3)) g2;
┌───────┐
│  v1   │
│ int64 │
├───────┤
│     1 │
│     2 │
│     3 │
│     1 │
│     2 │
│     3 │
│     1 │
│     2 │
│     3 │
└───────┘

Are there any user-facing changes?

No

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) proto Related to proto crate functions Changes to functions implementation labels Oct 27, 2025
@mkleen mkleen changed the title Fix: Add projection to generate series Fix: Add projection to generate_series Oct 27, 2025
@mkleen mkleen marked this pull request as ready for review October 27, 2025 12:42
@mkleen mkleen force-pushed the fix-projection-generate-series branch from e605743 to 589bd3d Compare October 27, 2025 13:18
@mkleen mkleen force-pushed the fix-projection-generate-series branch from 589bd3d to 3cbb00e Compare October 27, 2025 13:21
let array = self.current.create_array(buf)?;
let batch = RecordBatch::try_new(Arc::clone(&self.schema), vec![array])?;
Ok(Some(batch))
let projected = match self.projection.as_ref() {
Copy link
Contributor Author

@mkleen mkleen Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the MemoryStream projects in https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/memory.rs#L103 while the LazyMemoryStream used in generate_series doesn't. An alternative approach could be to project generically in the LazyMemoryStream instead inside of the generator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a great idea. Perhaps you could do so as a follow on PR

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mkleen

@alamb alamb added this pull request to the merge queue Oct 28, 2025
@alamb
Copy link
Contributor

alamb commented Oct 28, 2025

Thanks @mkleen and @xudong963

Merged via the queue into apache:main with commit 38f86c8 Oct 28, 2025
28 checks passed
tobixdev pushed a commit to tobixdev/datafusion that referenced this pull request Nov 2, 2025
## Which issue does this PR close?

- Closes apache#17830

## Rationale for this change

The queries from the original ticket fail, because an unprojected
`generate_series` function would produce in a join the wrong number of
columns which leads to a runtime error.

## What changes are included in this PR?

This adds a missing projection to `generate_series` to ensure values are
only emitted when projected.

## Are these changes tested?

I added a sql-logic test. I also compared the results against Postgres
and DuckDB:

Postgres:
```sql
mkleen=# SELECT v1 FROM (select generate_series as v1 from generate_series(1, 3)) g1, (select generate_series as v2 from generate_series(1, 3)) g2;
 v1
----
  1
  1
  1
  2
  2
  2
  3
  3
  3
(9 rows)
```
DuckDB:
```sql
D SELECT v1 FROM (select generate_series as v1 from generate_series(1, 3)) g1, (select generate_series as v2 from generate_series(1, 3)) g2;
┌───────┐
│  v1   │
│ int64 │
├───────┤
│     1 │
│     2 │
│     3 │
│     1 │
│     2 │
│     3 │
│     1 │
│     2 │
│     3 │
└───────┘
```
 
## Are there any user-facing changes?

No
github-merge-queue bot pushed a commit that referenced this pull request Nov 4, 2025
## Which issue does this PR close?

- None, This is a follow-up for
#18298

## Rationale for this change

This moves the projection logic from `generate_series` out of the
generator into `LazyMemoryStream` as discussed in
#18298 (comment)
This makes the projection logic generic for all generators.

## What changes are included in this PR?

The projection logic is moved from `generate_series` into the
`LazyMemoryStream` and relevant tests, where `LazyMemoryStream` is used,
are adapted accordingly.

## Are these changes tested?

This is only a small refactoring; the changes are covered by the tests
from #18298

## Are there any user-facing changes?

There is a new parameter added to LazyMemoryExec::try_new method
jizezhang pushed a commit to jizezhang/datafusion that referenced this pull request Nov 5, 2025
…8373)

## Which issue does this PR close?

- None, This is a follow-up for
apache#18298

## Rationale for this change

This moves the projection logic from `generate_series` out of the
generator into `LazyMemoryStream` as discussed in
apache#18298 (comment)
This makes the projection logic generic for all generators.

## What changes are included in this PR?

The projection logic is moved from `generate_series` into the
`LazyMemoryStream` and relevant tests, where `LazyMemoryStream` is used,
are adapted accordingly.

## Are these changes tested?

This is only a small refactoring; the changes are covered by the tests
from apache#18298

## Are there any user-facing changes?

There is a new parameter added to LazyMemoryExec::try_new method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation proto Related to proto crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error with generate_series: Invalid argument error: number of columns must match number of fields in schema

3 participants