Skip to content

Conversation

@2010YOUY01
Copy link
Contributor

Which issue does this PR close?

  • Closes #.

Rationale for this change

Currently generate_series() does not track metrics, see the result in datafusion-cli

> explain analyze select * from generate_series(1,10000);
+-------------------+-------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                              |
+-------------------+-------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=1, end=10000, batch_size=8192], metrics=[] |
|                   |                                                                                                                   |
+-------------------+-------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.003 seconds.

This PR includes basic metrics tracking for generate_series() table function:

> explain analyze select * from generate_series(1,10000);
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                                                         |
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=1, end=10000, batch_size=8192], metrics=[output_rows=10000, elapsed_compute=69.333µs] |
|                   |                                                                                                                                                              |
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.008 seconds.

What changes are included in this PR?

Update LazyMemoryExec for BaselineMetrics tracking: all executors implemented with it (like generate_series()) can track BaselineMetrics automatically.

Are these changes tested?

UT

Are there any user-facing changes?

No

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Jun 5, 2025
Copy link
Contributor

@zhuqi-lucas zhuqi-lucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thank you @2010YOUY01 , i have minor question:

Do we test before, does those BaselineMetrics will affect performance when large data set?

@2010YOUY01
Copy link
Contributor Author

LGTM Thank you @2010YOUY01 , i have minor question:

Do we test before, does those BaselineMetrics will affect performance when large data set?

Thanks for the review. To the question: currently no IMO -- because the overhead is just adding a number to an atomic var once per batch. But it might cause slowdown if we want to implement #16244 (comment), more benchmarks will be needed for that.

@zhuqi-lucas
Copy link
Contributor

LGTM Thank you @2010YOUY01 , i have minor question:
Do we test before, does those BaselineMetrics will affect performance when large data set?

Thanks for the review. To the question: currently no IMO -- because the overhead is just adding a number to an atomic var once per batch. But it might cause slowdown if we want to implement #16244 (comment), more benchmarks will be needed for that.

Thanks for the info!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @2010YOUY01 and @zhuqi-lucas

/// returning the same poll result
/// Process a poll result of a stream producing output for an operator.
///
/// Note: this method only updates `output_rows` and `end_time` metrics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@alamb alamb merged commit 2c8241a into apache:main Jun 5, 2025
27 checks passed
kosiew pushed a commit to kosiew/datafusion that referenced this pull request Jun 9, 2025
…ion (apache#16255)

* Add BaselineMetrics to LazyMemoryStream

* UT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants