Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate ValuesExec's exprs during execution #11736

Open
lewiszlw opened this issue Jul 31, 2024 · 5 comments
Open

Evaluate ValuesExec's exprs during execution #11736

lewiszlw opened this issue Jul 31, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@lewiszlw
Copy link
Member

lewiszlw commented Jul 31, 2024

Is your feature request related to a problem or challenge?

Now ValuesExec's exprs will be evaluated during planning. This might make planning longer and ValuesExec can't be serialized in protobuf.

#[derive(Debug)]
pub struct ValuesExec {
    /// The schema
    schema: SchemaRef,
    /// The data
    data: Vec<RecordBatch>,
    /// Cache holding plan properties like equivalences, output partitioning etc.
    cache: PlanProperties,
}

What about making ValuesExec contains physical exprs and evaluating during execution?

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

@alamb
Copy link
Contributor

alamb commented Jul 31, 2024

Now ValuesExec's exprs will be evaluated during planning. This might make planning longer and ValuesExec can't be serialized in protobuf.

Could we serialize the record batch using arrow IPC?

@lewiszlw
Copy link
Member Author

I think we could. But serializing exprs seems cheaper as the generated record batch might be large.

@alamb
Copy link
Contributor

alamb commented Aug 1, 2024

I think we could. But serializing exprs seems cheaper as the generated record batch might be large.

In the other hand, many Exprs may be expensive too -- I bet that for small/medium sized values Expr would be better as the overhead of a RecordBatch will be non trivial. But then as the number of rows increases at some point serializing the RecordBatch would be better

I bet it would be straightforward to test using https://docs.rs/arrow-ipc/52.2.0/arrow_ipc/writer/struct.StreamWriter.html to write to a vec (I was hoping for a nice example but seems like the docs aren't great yet)

@lewiszlw
Copy link
Member Author

lewiszlw commented Aug 2, 2024

For example, explain select * from values (generate_series(1, 100000)); will take 8s to complete in my machine.

@alamb
Copy link
Contributor

alamb commented Aug 2, 2024

For example, explain select * from values (generate_series(1, 100000)); will take 8s to complete in my machine.

It would be fascinating to run a profile / flamegraph and see where the time was going...

We should probably also change the EXPLAIN code to truncate showing the contents of VALUES after a while...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants